STATISTICS FORMULAS
MEASURES OF VARIATION
TOPIC OUTLINE i BOX WHISKER PLOT ii INTERQUARTILE RANGE iii STANDARD DEVIATION VARIANCE iv EMPIRICAL RULE v CHEBYSHEV S THEOREM vi SKEWED DATA
i BOX WHISKER PLOTS A Box Whisker Plot Box Plot is a graphical rendition of statistical data based on the Minimum First Quartile Median Third Quartile and Maximum Minimum Median First Quartile Q1 Maximum Third Quartile Q2
ii INTERQUARTILE RANGE To find the interquartile range IQR of a box plot you just need to subtract the third quartile from the first quartile
example Q1 0 5 IQR 10 Q3 15 20 Q1 5 Q3 15 15 5 10 The interquartile range is 10
iii STANDARD DEVIATION Standard Deviation measures how spread out numbers are either POPULATION all members in a specific group or SAMPLE s the members of a population SAMPLE s n 1 ____ xi x 2 n 1 i 1 POPULATION N 1 __ xi 2 N i 1 x or Mean N or n Number of Members
iv VARIANCE VARIANCE is the average of the squared difference from the mean 2 or 2 s
v EMPIRICAL RULE The empirical rule provides 3 quick estimates of the spread of data in a normal distribution given the mean and standard deviation
34 34 68 of values fall within ONE standard deviation of the mean 34 34 13 6 13 6 95 of values fall within TWO standard deviations of the mean 34 2 2 13 6 34 13 6 2 2 99 7 of values fall within THREE standard deviations of the mean
vi CHEBYSHEV S THEOREM Since the Empirical Rule only applies to bell shaped data the Chebyshev s Theorem is applied to all data CHEBYSHEV S THEOREM states that the proportion of values from a set of data that falls within the number of standard deviations of the mean will be at least 1 k2
Chebyshev s Theorem Formula 1 __ P 1 2 k P Proportion of values from a set of data that falls within the number of standard deviations of the mean k number of standard deviations away from the mean always greater than 1
vii SKEWNESS Skewness quantifies how symmetrical the distribution is Data can either have a Positive Skew Right Skewed Negative Skew Left Skewed No Skew No Shift
POSITIVE SKEW Mean Mode Mode Mean NEGATIVE SKEW Mean Mode Mean Mode NO SKEW Mean Mode
PROBABILITY
TOPIC OUTLINE i SETS ii TYPES OF EVENTS iii RULES OF PROBABILITY iv THEORETICAL EMPIRICAL PROBABILITY v PERMUTATION COMBINATION PROBABILITY
i SETS a SET is a collection of elements SETS are compared to show the relation of one another by UNION INTERSECTION and COMPLEMENT
UNION A B all elements in both sets INTERSECTION A B elements where sets overlap COMPLEMENT A elements not in the set
ii TYPES OF EVENTS INDEPENDENT EVENTS events for which the occurrence of one has no impact on the occurrence of the other Example Rolling a dice twice the results are independent
DEPENDENT EVENTS events that impact each other s probability Example Drawing two cards one at a time without returning them to the deck MUTUALLY EXCLUSIVE EVENTS two events that cannot occur simultaneously Example Tossing a coin the occurrences of heads or tails are mutually exclusive
iii RULES OF PROBABILITY The Three rules of Probability are the 1 Complement Rule 2 Addition Rule 3 Multiplication Rule 1 COMPLEMENT RULE P Ac 1 P A
2 ADDITION RULE P A B P A P B P A B If A B are mutually exclusive P A B P A P B 3 MULTIPLICATION RULE P A B P A P B A or P B P A B If A B are Independent P A B P A P B
iv THEORETICAL EMPIRICAL PROBABILITY THEORETICAL is what should happed in an experiment P E ______________________ of Favorable Outcomes of Possible Outcomes EMPIRICAL is what did happen in an experiment P E _________________________ of Times Event Occurred of Trials
v PERMUTATION COMBINATION PROB PERMUTATION includes all possible outcomes while keeping order Important nPr n n r COMBINATION includes all possible outcomes while order is NOT Important nCr n r n r n Total Possibilities r Selected Possibilities
PROBABILITY DISTRIBUTION
TOPIC OUTLINE i BINOMIAL DISTRIBUTION ii NORMAL DISTRIBUTION iii Z SCORE iv HOW TO STANDARD NORMAL TABLE
i BINOMIAL DISTRIBUTION Binomial Probability is a frequency distribution of the possible number of successful outcomes in a given number of trials in each of which there is the same probability of success
n n ______ x 1 p n x P __ p x n x x x out of n n Total Number x Number we want Binomial Distribution is used for discrete variables while Normal Distribution uses continuous variables
ii NORMAL DISTRIBUTION Normal Distribution is the most common probability distribution where the mean is at the center and is symmetrical and never touches the horizontal axis and helps find the probability of data value falling in a given standard deviation from the mean by using Z score
BELL SHAPED CURVE Mean 34 2 2 13 6 3 2 1 34 13 6 1 2 2 2 3 Standard Deviations From the Mean
iii Z SCORE Z Score represents how many standard deviations an data is from the mean z x 34 34 2 2 13 6 3 2 13 6 1 0 1 Z scores 2 2 2 3
iv HOW TO READ STANDARD NORMAL TABLE The Standard Normal Table shows a cumulative probability associated with a particular z score
Table ROWS show the whole number and tenths place and COLUMNS show the hundredths place of the zscore Example The probability left of z 0 31 is 0 6217 ROWS COLUMNS 0 31 0 31 Z 00 01 02 03 0 0 5000 5040 0508 5120 0 1 5398 5438 5478 5517 0 2 5793 5832 5871 5910 0 3 6179 6217 6255 6293
HYPOTHESIS TESTING
Topic Outline i WHAT IS HYPOTHESIS TESTINGS ii HYPOTHESIS TESTING STEPS iii DECISION RULES iv TEST STATISTICS v ONE TAILED TWO TAILED TESTS vi DECISION ERRORS
i HYPOTHESIS TESTING Hypothesis Testing is the procedure of testing a statement by rejecting or accepting the null hypothesis Reject Null Hypothesis Level of Significance Accept Null Hypothesis Known Distribution Reject Null Hypothesis Level of Significance If null hypothesis is rejected the the alternative hypothesis is accepted
Null Hypothesis H0 is statement about the value of a population parameter Alternative Hypothesis H1 or Ha is the statement that is accepted if evidence proves null hypothesis to be false
ii HYPOTHESIS TESTING STEPS 1 State the Null Hypothesis 2 State the Alternative Hypothesis 3 Set Level of significance 4 Collect Data 5 Calculate a test statistic 6 Specify Regions of Acceptance 7 Conclude your null hypothesis with based on data
iii ONE TWO TAILED TESTS In hypothesis testing the region of rejection is on only one side of the sampling distribution is a One Tailed Test A Two Tailed Test is if the region of rejection is on both sides of the sampling distribution
One Tailed Test is used when H1 is k or k k k H 0 k H 1 k H 0 k H 1 k Two Tailed Test is used when H1 is k 2 2 k H 0 k H 1 k
iv DECISION RULES Decisions rules on rejecting or accepting the null hypothesis is based on P value and Region Of Acceptance
P VALUE is the probability a score is in the extreme of the test statistic The p value gives you the level of confidence you have in a one tail test if you compute 1 pvalue
REGION OF ACCEPTANCE is the range where the null hypothesis is true If the test statistic falls within the these ranges the null hypothesis is accepted If the test statistic falls out of this range then it is rejected and is written as The hypothesis has been rejected at the level of significance
v TEST STATISTICS Test Statistics is used to assess the strength of evidence in support of a null hypothesis by using either a Z test or T test
Z Test is used for testing the mean of a population versus a standard or comparing the means of two populations with large n 30 samples _ x 0 ______ z n
t Test Similar top z test but ttest are used with small samples 30 n _ x 0 ______ t s n
vi DECISION ERRORS Type I error occurs when the researcher rejects a null hypothesis when it is true Type II error occurs when the researcher fails to reject a null hypothesis that is false
False True Null H0 Decision Accept Reject Correct Decision Type I error Type II error Correct Decision Significance Level is The probability of making a Type I Error and the Power of a hypothesis is the probability of not committing a Type II error
REGRESSION
TOPIC OUTLINE i CORRELATION ii TYPES OF CORRELATION iii SIMPLE REGRESSION iv MULTIPLE REGRESSION
i CORRELATION Correlation is a measure of association between two variables that can either be dependent or independent
Correlation Formula n rxy i 1 xi x yi y n i 1 n xi x 2 i 1 yi y 2 n Number of pairs of scores xy Number of pairs of scores x sum of x scores y Sum of y scores x2 Sum of squared x scores y2 Sum of squared y scores
ii TYPES OF CORRELATION Correlation coefficient determines the strength of a relationship between two variables that ranges from 1 to 1 If there is no correlation the correlation coefficient is 0
POSITIVE CORRELATION As one variable Increases the other Increases NEGATIVE CORRELATION As one variable Increases the other Decreases
PERFECT CORRELATION Strongest relationship where the Correlation coefficient is either 1 or 1 NO CORRELATION There is no relationship between the two variables
iii SIMPLE REGRESSION Simple regression is used to examine the relationship between one dependent and one independent variable Regression goes beyond correlation by adding future probabilities of what will happen to the dependent variable if the independent variable changes
EXAMPLE Age x 20 30 40 50 60 Heart Rate y 192 185 178 172 165 In this example heart rate is the dependent variable because it depends on age The relationship shows as Age increases the persons Heart Rate decreases So a future prediction would suggest that the older someone is the lower their heart rate will be
iv MULTIPLE REGRESSION MULTIPLE REGRESSION is an extension of simple linear regression It is used to predict the value of a Dependent Variable based on the value of two or more other Independent Variables
EXAMPLE Heart Rate y 192 185 178 172 165 Age x 20 30 40 50 60 Height x 5 7 6 4 5 9 5 8 5 4 Gender x M M M F F In this example Age Height and Gender are Independent because they don t affect one another A Multiple Regression would be able to predict a future possibility of what would happen to a persons heart rate based on their Age Height and Gender
STATISTICS KEY TERMS
Key Terms Outline Population Sample Statistics Inferential Statistics Descriptive Statistics Statistic Census Variable Discrete Variable Continuous Variable
Nominal Variable Ordinal Variable Control Group Treatment Group Bias Controlled Experiment Observational Study Placebo Blind Experiment Double Blind Experiment Homogeneous Group
Heterogeneous Group Random Sampling Convenience Sampling Systematic Sampling Stratified Sampling Cluster Sampling Degrees of Freedom
POPULATION A group of elements People objects etc whose measurements are of interest
SAMPLE A subset of a population on which data is collected in order to learn more about the population
STATISTICS A scientific field of study that provides approaches to making inferences about populations based on the examination of smaller sets of data
INFERENTIAL STATISTICS The large body of techniques that are based on probability and reasoning and that use sample data to make formal inferences conclusion about the population from which the sample was selected
DESCRIPTIVE STATISTICS The large body of techniques that are used primarily to summarize sample data including graphical techniques Plots that visually illustrate data and calculated statistical values Graphs
STATISTIC A value calculated from sample data
CENSUS The identification and measurement of all individuals in a population Sample entire population
VARIABLE The representation of some characteristic of an individual that can be measured or recorded and generally can take on multiple values
DISCRETE VARIABLE A variable that has an unlimited and uncountable number of possible values Examples Height time production line is
CONTINUOUS VARIABLE A variable that has an unlimited and uncountable number of possible values
NOMINAL VARIABLE A classification variable whose values do not have an intrinsic ordering contains the least amount of information Examples colors gender long distance phone company
ORDINAL VARIABLE An ordered variable for which the distance between possible values do not have any meaning or comparability contains the second lowest amount of information Examples chili heat scale football rankings credit risk rating
CONTROL GROUP A group of subjects in a study that do not receive the treatment
TREATMENT GROUP The group of subjects in a study that do receive the treatment
BIAS An influence on the outcome of a study that is unanticipated or unaccounted for
CONTROLLED EXPERIMENT A study having a control group and in which researchers control which subjects receive treatment
OBSERVATIONAL STUDY A study in which the researchers cannot control which subjects receive the treatment but can only observe the outcome of the treatment on the subjects who did and did not receive it for whatever reason Can t control outside factors
PLACEBO A fake treatment often given to people in a control group so that the subjects in control group and in the treatment group do not know which group they are really in
BLIND EXPERIMENT An experiment When a placebo is given in which subjects do not know if they are receiving the treatment
DOUBLE BLIND EXPERIMENT An experiment in which neither subjects nor the people who examine the subjects know if a subject is receiving the treatment
HOMOGENOUS GROUP A group in which the subjects are very similar in every aspect that might influence the outcome being investigated in a study
HETEROGENEOUS GROUP A group in which the subjects are very diverse in important aspects that might influence the outcome being investigated in a study
RANDOM SAMPLING Any method of selecting a sample where the probability of inclusion in the sample can be calculated for any individual in the target population
CONVENIENCE SAMPLING chooses the individuals easiest to reach
SYSTEMATIC SAMPLING every nth member of the population is selected after arranged by some characteristic
STRATIFIED SAMPLING the population is divided into subpopulations strata and samples are taken of each stratum with proportion
CLUSTER SAMPLING A sampling technique in which clusters of participants that represent the population are used
DEGREES OF FREEDOM degrees of freedom as df is the number of observations minus the number of restrictions
STATISTICS PRACTICE PROBLEMS
Question 1 Statistics is the study of A Testing and interpreting statistical hypotheses about a relationship B Mathematical analysis using samples instead of populations C Summarizing analyzing or drawing inferences about a relationship D Inferring something about a population from a sample
ANSWER Statistics is the study of Summarizing analyzing or drawing inferences about a relationship Answer C
Question 2 Which of the following is not a variable A Personal fantasies B The final score of the Knicks game on 3 31 16 C Body temperatures of people who are NOT sick D Age
ANSWER Personal fantasies body temperatures of people who are NOT sick and Age are all variables Answer B
Question 3 What kind of variable is a person s height A B C D Discrete Continuous Ungrouped Grouped
ANSWER Answer B Continuous
Question 4 In a positively skewed distribution A The bulk of the scores are on the left side B The bulk of the scores are on the right side C The bulk of scores are in the middle D There is not bulk of scores
ANSWER In a positively skewed distribution the bulk of the scores are on the left side Answer A
Question 5 Kurtosis is the measure of A The amount that the peak is shifted positively or negatively B The shape of the distribution s peak C Whether the distribution has enough scores N D The cumulative relative frequency
ANSWER Kurtosis is the measure of the shape of the distribution s peak Answer B
Question 6 The median is a better choice of measuring central tendency than the mean when A The distribution is bimodal B The distribution is heavily positively skewed C The distribution is leptokurtic D The distribution is a normal curve
ANSWER The median is a better choice of measuring central tendency than the mean when the distribution is heavily positively skewed Answer B
Question 7 Why do we use standard deviation instead of variance A Standard deviation takes into account the N of the sample B Standard deviation is less susceptible to outlier scores than the variance C Standard deviation has the same scale of measurement as the mean
ANSWER Standard deviation is used of variance because Standard deviation has the same scale of measurement as the mean Answer C
Question 8 Which of the following is NOT a requirement of a perfect normal distribution A Identical mean median and mode B Asymptotic distribution C Has a standard deviation of 1 D Has range of negative infinity to positive infinity
ANSWER Perfect normal distribution Has Identical mean median and mode asymptotic distribution and range of negative infinity to positive infinity Answer C
Question 9 Which of the following is FALSE about standard normal distributions A Standard deviation 1 B Mean 0 C Easier to compare values across variables D Converting scores to a standard normal distribution turns the distribution into a perfect normal distribution
ANSWER Standard deviation 1 mean 0 and easier to compare values across variables are all true about standard normal distributions Answer D
Question 10 If you were to look at the scores on a test from a single classroom and assume the findings apply to the whole school you would be using A B C D Descriptive statistics Inferential statistics Sample statistics Population statistics
ANSWER Answer B Inferential statistics
Question 11 Which of these is the least effective type of sampling A B C D Snowball sampling Convenience sampling Random sampling Stratified random sampling
ANSWER Convenience sampling is not the most effective form of sampling Answer B
Question 12 Which of the following is false about H0 and H1 A They must be mutually exclusive B They must be appropriate for the rejection criteria C There can be no overlap or exceptions D They must be all encompassing
ANSWER They must be mutually exclusive There can be no overlap or exceptions and They must be all encompassing are all true about H0 and H1 Answer B
Question 13 When testing something that has very little room for error what is the best choice to set alpha to A B C D 005 10 95 995
ANSWER Alpha refers to significance level and 005 is the lowest of the numbers given Answer A
Question 14 _____ is when you accept the null hypothesis when it s false A B C D Type 1 Error Type 2 Error H0 Error H1 Error
ANSWER Type 2 Error is when you accept the null hypothesis when it s false Answer A
Question 15 _____ is used when you have a sample mean estimated standard error population mean and N A Independent samples t test B Repeated measures t test C One sample t test D Bivariate t test
ANSWER One sample t test is used when you have a sample mean estimated standard error population mean and N Answer C
Question 16 If you re doing a t test and the N 30 and you want to use 95 confidence you should look up the t table value at A B C D 29 29 30 30 and and and and 05 95 05 95
ANSWER Answer A 29 and 05
Question 17 If you re doing a t test and your N 32 you should look up the t table value at A 30 B 40 C Average 30 and 40 D Average 30 and 40 with 80 weight on the 30
ANSWER Answer A 30
USE THE FOLLOWING SENARIO FOR QUESTION 18 A few group of statistics professors decides to evaluate the effect of milk consumption on height by monitoring the height of students in a classroom where milk is supplied daily and the height of students in a classroom where milk is not supplied daily The professors sets their rejection parameters to t 2 045 with an alpha of 05 After running their analysis they discover that my independent samples t test has returned me t 1 940
Question 18 What conclusion should they reach and what error are they at risk of committing A Accept null hypothesis Type 1 Error B Accept null hypothesis Type 2 Error C Reject null hypothesis Type 1 Error D Reject null hypothesis Type 2 Error
ANSWER Answer B Accept null hypothesis Type 2 Error
Question 19 A die is tossed 24 times and lands on 1 three times Find the theoretical probability of landing on a 1 A B C D 24 3 24 1 1 6
ANSWER There is a 1 6 chance that the die would land on one so theoretical probability would be 1 6 Answer D 1 6
Question 20 A die is tossed 24 times and lands on 1 three times Find the experimental probability of landing on a 1 A B C D 1 8 3 24 1 24
ANSWER The die landed on 1 three times out of 24 total tosses so the experimental probability would be 3 24 which is simplified to 1 8 Answer A 1 8
Question 21 The number of restaurants in 11 malls in a city are 2 4 7 7 9 9 9 10 15 18 20 What are the mean median and mode
ANSWER Mean sum of all samples divided by total number of samples Mean 2 4 7 7 9 9 9 10 15 18 20 11 10 Median is the middle number in the data which is 9 Mode is the number that appears the most often which is 9 Mean 10 Median 9 Mode 9
Question 22 The number of pets in 12 households on one block are 0 0 1 1 2 3 3 3 4 4 5 7 What are the variance and standard deviation in this sample
ANSWER Plug into variance formula S2 X Xbar 2 4 3865 N 1 Square the variance to find the 4 3865 2 094 S2 4 3865 2 094
USE THE FOLLOWING SCENERIO FOR QUESTION 23 A designer brand are designing a new pair of exclusive sneakers for men Men have shoe sizes that are normally distributed with a mean of 6 0 in and a standard deviation of 1 0 in Due to financial constraints the sneakers will be designed to fit all men except those with shoe sizes that are in the smallest 2 5 or largest 2 5
Question 23 Find the minimum and maximum shoe sizes that will fit men
ANSWER Find the z scores of population centered at 0 2 5 2 5 100 95 population z 1 96 1 96 Now plug in z score to find maximum and minimum x 6 1 1 96 4 04 Minimum 4 04 x 6 1 1 96 7 96 Maximum 7 96
USE THE FOLLOWING SCENERIO FOR QUESTION 24 A bar allows its customers to keep a tab with them Among 35 randomly selected customers with a tab at the bar it was found that the mean amount owed was 175 37 while the standard deviation was 84 77
Question 24 Using a significance level of 0 05 test the claim that the mean amount owed by all customers is greater than 150 00
ANSWER Create a null and alternative hypothesis Ho 150 Ha 150 Next run a t test t x n t 175 37 150 84 77 35 1 770567 Since the critical value is 1 690923 we reject the null hypothesis
Question 25 In a survey taken at the beach 47 people preferred iced tea 28 preferred lemonade and 25 preferred water If the manager of the Beach Hut is going to buy 50 cases of beverages for the next day about how many cases should be lemonade A B C D 28 100 28 50 28 14
ANSWER Divide the number of people who preferred lemonade to the total amount of people in the survey 28 lemonade 100 people surveyed Since the beach hut will only buy 50 cases divide the fraction by 2 28 100 14 50 So 14 out of the 50 cases will be lemonade Answer D 14