Statistics Study Guide : simplebooklet.com

TOPIC OUTLINE i BOX WHISKER PLOT ii INTERQUARTILE RANGE iii STANDARD DEVIATION VARIANCE iv EMPIRICAL RULE v CHEBYSHEV S THEOREM vi SKEWED DATA

Page 10

i BOX WHISKER PLOTS A Box Whisker Plot Box Plot is a graphical rendition of statistical data based on the Minimum First Quartile Median Third Quartile and Maximum Minimum Median First Quartile Q1 Maximum Third Quartile Q2

Page 11

ii INTERQUARTILE RANGE To find the interquartile range IQR of a box plot you just need to subtract the third quartile from the first quartile

Page 12

example Q1 0 5 IQR 10 Q3 15 20 Q1 5 Q3 15 15 5 10 The interquartile range is 10

Page 13

iii STANDARD DEVIATION Standard Deviation measures how spread out numbers are either POPULATION all members in a specific group or SAMPLE s the members of a population SAMPLE s n 1 ____ xi x 2 n 1 i 1 POPULATION N 1 __ xi 2 N i 1 x or Mean N or n Number of Members

Page 14

iv VARIANCE VARIANCE is the average of the squared difference from the mean 2 or 2 s

Page 15

v EMPIRICAL RULE The empirical rule provides 3 quick estimates of the spread of data in a normal distribution given the mean and standard deviation

Page 16

34 34 68 of values fall within ONE standard deviation of the mean 34 34 13 6 13 6 95 of values fall within TWO standard deviations of the mean 34 2 2 13 6 34 13 6 2 2 99 7 of values fall within THREE standard deviations of the mean

Page 17

vi CHEBYSHEV S THEOREM Since the Empirical Rule only applies to bell shaped data the Chebyshev s Theorem is applied to all data CHEBYSHEV S THEOREM states that the proportion of values from a set of data that falls within the number of standard deviations of the mean will be at least 1 k2

Page 18

Chebyshev s Theorem Formula 1 __ P 1 2 k P Proportion of values from a set of data that falls within the number of standard deviations of the mean k number of standard deviations away from the mean always greater than 1

Page 19

vii SKEWNESS Skewness quantifies how symmetrical the distribution is Data can either have a Positive Skew Right Skewed Negative Skew Left Skewed No Skew No Shift

Page 20

POSITIVE SKEW Mean Mode Mode Mean NEGATIVE SKEW Mean Mode Mean Mode NO SKEW Mean Mode

Page 21

Page 22

TOPIC OUTLINE i SETS ii TYPES OF EVENTS iii RULES OF PROBABILITY iv THEORETICAL EMPIRICAL PROBABILITY v PERMUTATION COMBINATION PROBABILITY

Page 23

i SETS a SET is a collection of elements SETS are compared to show the relation of one another by UNION INTERSECTION and COMPLEMENT

Page 24

UNION A B all elements in both sets INTERSECTION A B elements where sets overlap COMPLEMENT A elements not in the set

Page 25

ii TYPES OF EVENTS INDEPENDENT EVENTS events for which the occurrence of one has no impact on the occurrence of the other Example Rolling a dice twice the results are independent

Page 26

DEPENDENT EVENTS events that impact each other s probability Example Drawing two cards one at a time without returning them to the deck MUTUALLY EXCLUSIVE EVENTS two events that cannot occur simultaneously Example Tossing a coin the occurrences of heads or tails are mutually exclusive

Page 27

iii RULES OF PROBABILITY The Three rules of Probability are the 1 Complement Rule 2 Addition Rule 3 Multiplication Rule 1 COMPLEMENT RULE P Ac 1 P A

Page 28

2 ADDITION RULE P A B P A P B P A B If A B are mutually exclusive P A B P A P B 3 MULTIPLICATION RULE P A B P A P B A or P B P A B If A B are Independent P A B P A P B

Page 29

iv THEORETICAL EMPIRICAL PROBABILITY THEORETICAL is what should happed in an experiment P E ______________________ of Favorable Outcomes of Possible Outcomes EMPIRICAL is what did happen in an experiment P E _________________________ of Times Event Occurred of Trials

Page 30

v PERMUTATION COMBINATION PROB PERMUTATION includes all possible outcomes while keeping order Important nPr n n r COMBINATION includes all possible outcomes while order is NOT Important nCr n r n r n Total Possibilities r Selected Possibilities

Page 31

Page 32

TOPIC OUTLINE i BINOMIAL DISTRIBUTION ii NORMAL DISTRIBUTION iii Z SCORE iv HOW TO STANDARD NORMAL TABLE

Page 33

i BINOMIAL DISTRIBUTION Binomial Probability is a frequency distribution of the possible number of successful outcomes in a given number of trials in each of which there is the same probability of success

Page 34

n n ______ x 1 p n x P __ p x n x x x out of n n Total Number x Number we want Binomial Distribution is used for discrete variables while Normal Distribution uses continuous variables

Page 35

ii NORMAL DISTRIBUTION Normal Distribution is the most common probability distribution where the mean is at the center and is symmetrical and never touches the horizontal axis and helps find the probability of data value falling in a given standard deviation from the mean by using Z score

Page 36

BELL SHAPED CURVE Mean 34 2 2 13 6 3 2 1 34 13 6 1 2 2 2 3 Standard Deviations From the Mean

Page 37

iii Z SCORE Z Score represents how many standard deviations an data is from the mean z x 34 34 2 2 13 6 3 2 13 6 1 0 1 Z scores 2 2 2 3

Page 38

iv HOW TO READ STANDARD NORMAL TABLE The Standard Normal Table shows a cumulative probability associated with a particular z score

Page 39

Table ROWS show the whole number and tenths place and COLUMNS show the hundredths place of the zscore Example The probability left of z 0 31 is 0 6217 ROWS COLUMNS 0 31 0 31 Z 00 01 02 03 0 0 5000 5040 0508 5120 0 1 5398 5438 5478 5517 0 2 5793 5832 5871 5910 0 3 6179 6217 6255 6293

Page 40

Page 41

Topic Outline i WHAT IS HYPOTHESIS TESTINGS ii HYPOTHESIS TESTING STEPS iii DECISION RULES iv TEST STATISTICS v ONE TAILED TWO TAILED TESTS vi DECISION ERRORS

Page 42

i HYPOTHESIS TESTING Hypothesis Testing is the procedure of testing a statement by rejecting or accepting the null hypothesis Reject Null Hypothesis Level of Significance Accept Null Hypothesis Known Distribution Reject Null Hypothesis Level of Significance If null hypothesis is rejected the the alternative hypothesis is accepted

Page 43

Null Hypothesis H0 is statement about the value of a population parameter Alternative Hypothesis H1 or Ha is the statement that is accepted if evidence proves null hypothesis to be false

Page 44

ii HYPOTHESIS TESTING STEPS 1 State the Null Hypothesis 2 State the Alternative Hypothesis 3 Set Level of significance 4 Collect Data 5 Calculate a test statistic 6 Specify Regions of Acceptance 7 Conclude your null hypothesis with based on data

Page 45

iii ONE TWO TAILED TESTS In hypothesis testing the region of rejection is on only one side of the sampling distribution is a One Tailed Test A Two Tailed Test is if the region of rejection is on both sides of the sampling distribution

Page 46

One Tailed Test is used when H1 is k or k k k H 0 k H 1 k H 0 k H 1 k Two Tailed Test is used when H1 is k 2 2 k H 0 k H 1 k

Page 47

iv DECISION RULES Decisions rules on rejecting or accepting the null hypothesis is based on P value and Region Of Acceptance

Page 48

P VALUE is the probability a score is in the extreme of the test statistic The p value gives you the level of confidence you have in a one tail test if you compute 1 pvalue

Page 49

REGION OF ACCEPTANCE is the range where the null hypothesis is true If the test statistic falls within the these ranges the null hypothesis is accepted If the test statistic falls out of this range then it is rejected and is written as The hypothesis has been rejected at the level of significance

Page 50

v TEST STATISTICS Test Statistics is used to assess the strength of evidence in support of a null hypothesis by using either a Z test or T test

Page 51

Z Test is used for testing the mean of a population versus a standard or comparing the means of two populations with large n 30 samples _ x 0 ______ z n

Page 52

t Test Similar top z test but ttest are used with small samples 30 n _ x 0 ______ t s n

Page 53

vi DECISION ERRORS Type I error occurs when the researcher rejects a null hypothesis when it is true Type II error occurs when the researcher fails to reject a null hypothesis that is false

Page 54

False True Null H0 Decision Accept Reject Correct Decision Type I error Type II error Correct Decision Significance Level is The probability of making a Type I Error and the Power of a hypothesis is the probability of not committing a Type II error

Page 55

Page 56

TOPIC OUTLINE i CORRELATION ii TYPES OF CORRELATION iii SIMPLE REGRESSION iv MULTIPLE REGRESSION

Page 57

i CORRELATION Correlation is a measure of association between two variables that can either be dependent or independent

Page 58

Correlation Formula n rxy i 1 xi x yi y n i 1 n xi x 2 i 1 yi y 2 n Number of pairs of scores xy Number of pairs of scores x sum of x scores y Sum of y scores x2 Sum of squared x scores y2 Sum of squared y scores

Page 59

ii TYPES OF CORRELATION Correlation coefficient determines the strength of a relationship between two variables that ranges from 1 to 1 If there is no correlation the correlation coefficient is 0

Page 60

POSITIVE CORRELATION As one variable Increases the other Increases NEGATIVE CORRELATION As one variable Increases the other Decreases

Page 61

PERFECT CORRELATION Strongest relationship where the Correlation coefficient is either 1 or 1 NO CORRELATION There is no relationship between the two variables

Page 62

iii SIMPLE REGRESSION Simple regression is used to examine the relationship between one dependent and one independent variable Regression goes beyond correlation by adding future probabilities of what will happen to the dependent variable if the independent variable changes

Page 63

EXAMPLE Age x 20 30 40 50 60 Heart Rate y 192 185 178 172 165 In this example heart rate is the dependent variable because it depends on age The relationship shows as Age increases the persons Heart Rate decreases So a future prediction would suggest that the older someone is the lower their heart rate will be

Page 64

iv MULTIPLE REGRESSION MULTIPLE REGRESSION is an extension of simple linear regression It is used to predict the value of a Dependent Variable based on the value of two or more other Independent Variables

Page 65

EXAMPLE Heart Rate y 192 185 178 172 165 Age x 20 30 40 50 60 Height x 5 7 6 4 5 9 5 8 5 4 Gender x M M M F F In this example Age Height and Gender are Independent because they don t affect one another A Multiple Regression would be able to predict a future possibility of what would happen to a persons heart rate based on their Age Height and Gender

Page 66

Page 67

Key Terms Outline Population Sample Statistics Inferential Statistics Descriptive Statistics Statistic Census Variable Discrete Variable Continuous Variable

Page 68

Nominal Variable Ordinal Variable Control Group Treatment Group Bias Controlled Experiment Observational Study Placebo Blind Experiment Double Blind Experiment Homogeneous Group

Page 69

Heterogeneous Group Random Sampling Convenience Sampling Systematic Sampling Stratified Sampling Cluster Sampling Degrees of Freedom

Page 70

POPULATION A group of elements People objects etc whose measurements are of interest

Page 71

SAMPLE A subset of a population on which data is collected in order to learn more about the population

Page 72

STATISTICS A scientific field of study that provides approaches to making inferences about populations based on the examination of smaller sets of data

Page 73

INFERENTIAL STATISTICS The large body of techniques that are based on probability and reasoning and that use sample data to make formal inferences conclusion about the population from which the sample was selected

Page 74

DESCRIPTIVE STATISTICS The large body of techniques that are used primarily to summarize sample data including graphical techniques Plots that visually illustrate data and calculated statistical values Graphs

Page 75

Page 76

CENSUS The identification and measurement of all individuals in a population Sample entire population

Page 77

VARIABLE The representation of some characteristic of an individual that can be measured or recorded and generally can take on multiple values

Page 78

DISCRETE VARIABLE A variable that has an unlimited and uncountable number of possible values Examples Height time production line is

Page 79

CONTINUOUS VARIABLE A variable that has an unlimited and uncountable number of possible values

Page 80

NOMINAL VARIABLE A classification variable whose values do not have an intrinsic ordering contains the least amount of information Examples colors gender long distance phone company

Page 81

ORDINAL VARIABLE An ordered variable for which the distance between possible values do not have any meaning or comparability contains the second lowest amount of information Examples chili heat scale football rankings credit risk rating

Page 82

CONTROL GROUP A group of subjects in a study that do not receive the treatment

Page 83

TREATMENT GROUP The group of subjects in a study that do receive the treatment

Page 84

BIAS An influence on the outcome of a study that is unanticipated or unaccounted for

Page 85

CONTROLLED EXPERIMENT A study having a control group and in which researchers control which subjects receive treatment

Page 86

OBSERVATIONAL STUDY A study in which the researchers cannot control which subjects receive the treatment but can only observe the outcome of the treatment on the subjects who did and did not receive it for whatever reason Can t control outside factors

Page 87

PLACEBO A fake treatment often given to people in a control group so that the subjects in control group and in the treatment group do not know which group they are really in

Page 88

BLIND EXPERIMENT An experiment When a placebo is given in which subjects do not know if they are receiving the treatment

Page 89

DOUBLE BLIND EXPERIMENT An experiment in which neither subjects nor the people who examine the subjects know if a subject is receiving the treatment

Page 90

HOMOGENOUS GROUP A group in which the subjects are very similar in every aspect that might influence the outcome being investigated in a study

Page 91

HETEROGENEOUS GROUP A group in which the subjects are very diverse in important aspects that might influence the outcome being investigated in a study

Page 92

RANDOM SAMPLING Any method of selecting a sample where the probability of inclusion in the sample can be calculated for any individual in the target population

Page 93

CONVENIENCE SAMPLING chooses the individuals easiest to reach

Page 94

SYSTEMATIC SAMPLING every nth member of the population is selected after arranged by some characteristic

Page 95

STRATIFIED SAMPLING the population is divided into subpopulations strata and samples are taken of each stratum with proportion

Page 96

CLUSTER SAMPLING A sampling technique in which clusters of participants that represent the population are used

Page 97

DEGREES OF FREEDOM degrees of freedom as df is the number of observations minus the number of restrictions

Page 98

Page 99

Question 1 Statistics is the study of A Testing and interpreting statistical hypotheses about a relationship B Mathematical analysis using samples instead of populations C Summarizing analyzing or drawing inferences about a relationship D Inferring something about a population from a sample

Page 100

ANSWER Statistics is the study of Summarizing analyzing or drawing inferences about a relationship Answer C

Page 101

Question 2 Which of the following is not a variable A Personal fantasies B The final score of the Knicks game on 3 31 16 C Body temperatures of people who are NOT sick D Age

Page 102

ANSWER Personal fantasies body temperatures of people who are NOT sick and Age are all variables Answer B

Page 103

Question 3 What kind of variable is a person s height A B C D Discrete Continuous Ungrouped Grouped

Page 104

Page 105

Question 4 In a positively skewed distribution A The bulk of the scores are on the left side B The bulk of the scores are on the right side C The bulk of scores are in the middle D There is not bulk of scores

Page 106

ANSWER In a positively skewed distribution the bulk of the scores are on the left side Answer A

Page 107

Question 5 Kurtosis is the measure of A The amount that the peak is shifted positively or negatively B The shape of the distribution s peak C Whether the distribution has enough scores N D The cumulative relative frequency

Page 108

ANSWER Kurtosis is the measure of the shape of the distribution s peak Answer B

Page 109

Question 6 The median is a better choice of measuring central tendency than the mean when A The distribution is bimodal B The distribution is heavily positively skewed C The distribution is leptokurtic D The distribution is a normal curve

Page 110

ANSWER The median is a better choice of measuring central tendency than the mean when the distribution is heavily positively skewed Answer B

Page 111

Question 7 Why do we use standard deviation instead of variance A Standard deviation takes into account the N of the sample B Standard deviation is less susceptible to outlier scores than the variance C Standard deviation has the same scale of measurement as the mean

Page 112

ANSWER Standard deviation is used of variance because Standard deviation has the same scale of measurement as the mean Answer C

Page 113

Question 8 Which of the following is NOT a requirement of a perfect normal distribution A Identical mean median and mode B Asymptotic distribution C Has a standard deviation of 1 D Has range of negative infinity to positive infinity

Page 114

ANSWER Perfect normal distribution Has Identical mean median and mode asymptotic distribution and range of negative infinity to positive infinity Answer C

Page 115

Question 9 Which of the following is FALSE about standard normal distributions A Standard deviation 1 B Mean 0 C Easier to compare values across variables D Converting scores to a standard normal distribution turns the distribution into a perfect normal distribution

Page 116

ANSWER Standard deviation 1 mean 0 and easier to compare values across variables are all true about standard normal distributions Answer D

Page 117

Question 10 If you were to look at the scores on a test from a single classroom and assume the findings apply to the whole school you would be using A B C D Descriptive statistics Inferential statistics Sample statistics Population statistics

Page 118

Page 119

Question 11 Which of these is the least effective type of sampling A B C D Snowball sampling Convenience sampling Random sampling Stratified random sampling

Page 120

ANSWER Convenience sampling is not the most effective form of sampling Answer B

Page 121

Question 12 Which of the following is false about H0 and H1 A They must be mutually exclusive B They must be appropriate for the rejection criteria C There can be no overlap or exceptions D They must be all encompassing

Page 122

ANSWER They must be mutually exclusive There can be no overlap or exceptions and They must be all encompassing are all true about H0 and H1 Answer B

Page 123

Question 13 When testing something that has very little room for error what is the best choice to set alpha to A B C D 005 10 95 995

Page 124

ANSWER Alpha refers to significance level and 005 is the lowest of the numbers given Answer A

Page 125

Question 14 _____ is when you accept the null hypothesis when it s false A B C D Type 1 Error Type 2 Error H0 Error H1 Error

Page 126

ANSWER Type 2 Error is when you accept the null hypothesis when it s false Answer A

Page 127

Question 15 _____ is used when you have a sample mean estimated standard error population mean and N A Independent samples t test B Repeated measures t test C One sample t test D Bivariate t test

Page 128

ANSWER One sample t test is used when you have a sample mean estimated standard error population mean and N Answer C

Page 129

Question 16 If you re doing a t test and the N 30 and you want to use 95 confidence you should look up the t table value at A B C D 29 29 30 30 and and and and 05 95 05 95

Page 130

Page 131

Question 17 If you re doing a t test and your N 32 you should look up the t table value at A 30 B 40 C Average 30 and 40 D Average 30 and 40 with 80 weight on the 30

Page 132

Page 133

USE THE FOLLOWING SENARIO FOR QUESTION 18 A few group of statistics professors decides to evaluate the effect of milk consumption on height by monitoring the height of students in a classroom where milk is supplied daily and the height of students in a classroom where milk is not supplied daily The professors sets their rejection parameters to t 2 045 with an alpha of 05 After running their analysis they discover that my independent samples t test has returned me t 1 940

Page 134

Question 18 What conclusion should they reach and what error are they at risk of committing A Accept null hypothesis Type 1 Error B Accept null hypothesis Type 2 Error C Reject null hypothesis Type 1 Error D Reject null hypothesis Type 2 Error

Page 135

Page 136

Question 19 A die is tossed 24 times and lands on 1 three times Find the theoretical probability of landing on a 1 A B C D 24 3 24 1 1 6

Page 137

ANSWER There is a 1 6 chance that the die would land on one so theoretical probability would be 1 6 Answer D 1 6

Page 138

Question 20 A die is tossed 24 times and lands on 1 three times Find the experimental probability of landing on a 1 A B C D 1 8 3 24 1 24

Page 139

ANSWER The die landed on 1 three times out of 24 total tosses so the experimental probability would be 3 24 which is simplified to 1 8 Answer A 1 8

Page 140

Question 21 The number of restaurants in 11 malls in a city are 2 4 7 7 9 9 9 10 15 18 20 What are the mean median and mode

Page 141

ANSWER Mean sum of all samples divided by total number of samples Mean 2 4 7 7 9 9 9 10 15 18 20 11 10 Median is the middle number in the data which is 9 Mode is the number that appears the most often which is 9 Mean 10 Median 9 Mode 9

Page 142

Question 22 The number of pets in 12 households on one block are 0 0 1 1 2 3 3 3 4 4 5 7 What are the variance and standard deviation in this sample

Page 143

ANSWER Plug into variance formula S2 X Xbar 2 4 3865 N 1 Square the variance to find the 4 3865 2 094 S2 4 3865 2 094

Page 144

USE THE FOLLOWING SCENERIO FOR QUESTION 23 A designer brand are designing a new pair of exclusive sneakers for men Men have shoe sizes that are normally distributed with a mean of 6 0 in and a standard deviation of 1 0 in Due to financial constraints the sneakers will be designed to fit all men except those with shoe sizes that are in the smallest 2 5 or largest 2 5

Page 145

Question 23 Find the minimum and maximum shoe sizes that will fit men

Page 146

ANSWER Find the z scores of population centered at 0 2 5 2 5 100 95 population z 1 96 1 96 Now plug in z score to find maximum and minimum x 6 1 1 96 4 04 Minimum 4 04 x 6 1 1 96 7 96 Maximum 7 96

Page 147

USE THE FOLLOWING SCENERIO FOR QUESTION 24 A bar allows its customers to keep a tab with them Among 35 randomly selected customers with a tab at the bar it was found that the mean amount owed was 175 37 while the standard deviation was 84 77

Page 148

Question 24 Using a significance level of 0 05 test the claim that the mean amount owed by all customers is greater than 150 00

Page 149

ANSWER Create a null and alternative hypothesis Ho 150 Ha 150 Next run a t test t x n t 175 37 150 84 77 35 1 770567 Since the critical value is 1 690923 we reject the null hypothesis

Page 150

Question 25 In a survey taken at the beach 47 people preferred iced tea 28 preferred lemonade and 25 preferred water If the manager of the Beach Hut is going to buy 50 cases of beverages for the next day about how many cases should be lemonade A B C D 28 100 28 50 28 14

Page 151

ANSWER Divide the number of people who preferred lemonade to the total amount of people in the survey 28 lemonade 100 people surveyed Since the beach hut will only buy 50 cases divide the fraction by 2 28 100 14 50 So 14 out of the 50 cases will be lemonade Answer D 14