## Statistical Methods (STAT 4303) Review for Final Comprehensive Exam Measures of Central Tendency, Dispersion Q.1. The data below represents the test scores obtained by students in college algebra class. 10,12,15,20,13,16,14 Calculate (a) Mean (b) Median (c) Mode (d) Variance, s2 (e) Coefficient of variation (CV) Q.2. The data below represents the test scores obtained by students in English class. 12,15,16,18,13,10,17,20 Calculate (a) Mean (b) Median (c) Mode (d) Variance, s2 (e) Coefficient of variation (CV) (f) Compare the results of Q.1 and Q.2, Which scores College Algebra or English do you think is more precise (less spread)? Q.3 Following data represents the score obtained by students in one of the exams 9, 13, 14, 15, 16, 16, 17, 19, 20, 21, 21, 22, 25, 25, 26 Create a frequency table to calculate the following descriptive statistics (a) mean (b) median (c) mode (d) first and third quartiles (e) Construct Box and Whisker plot. (f) Comment on the shape of the distribution. (g) Find inter quartile range (IQR). (h) Are there any outliers (based on IQR technique)? In the above problem, if the score 26 is replaced by 37 (i) What will happen to the mean? Will it increase, decrease or remains the same? (j) What will be the new median? (k) What can you say about the effect of outliers on mean and median? Q.4 Following data represents the score obtained by students in one of the exams 19, 14, 14, 15, 17, 16, 17, 20, 20, 21, 21, 22, 25, 25, 26, 27, 28 Create a frequency table to calculate the following descriptive statistics a) mean b) median c) mode d) first and third quartiles e) Construct Box and Whisker plot. f) Comment on the shape of the distribution. g) Find inter quartile range (IQR). h) Are there any outliers (based on IQR technique)? In the above problem, if the score 28 is replaced by 48 i) What will happen to the mean? Will it increase, decrease or remains the same? j) What will be the new median? k) What can you say about the effect of outliers on mean and median? Q.5 Consider the following data of height (in inch) and weight(in lbs). Height(x) Frequency 50 2 52 3 55 2 60 4 62 3 Find the mean height. What is the variance of height? Also, find the standard deviation. (c) Find the coefficient of variation (CV). Q.6. The following table shows the number of miles run during one week for a sample of 20 runners: Miles Mid-value (x) Frequency (f) 5.5-10.5 1 10.5-15.5 2 15.5-20.5 3 20.5-25.5 5 25.5-30.5 4 (a) Find the average (mean) miles run. (Hint: Find mid-value of mile range first) (b) What is the variance of miles run? Also, find the standard deviation. (c) Find the coefficient of variation (CV). Q.7. (a) If the mean of 20 observations is 20.5, find the sum of all observations? (b) If the mean of 30 observations is 40, find the sum of all observations? Probability Q.8 Out of forty students, 14 are taking English Composition and 29 are taking Chemistry. a) How many students are in both classes? b) What is the probability that a randomly-chosen student from this group is taking only the Chemistry class? Q.9 A drawer contains 4 red balls, 5 green balls, and 5 blue balls. One ball is taken from the drawer and then replaced. Another ball is taken from the drawer. What is the probability that (Draw tree diagram to facilitate your calculation). (a) both balls are red (b) first ball is red (c) both balls are of same colors (d) both balls are of different colors (e) first ball is red and second ball is blue (f) first ball is red or blue Q.10 A drawer contains 3 red balls, 5 green balls, and 5 blue balls. One ball is taken from the drawer and not replaced. Another ball is then taken from the drawer. Draw tree diagram to facilitate your calculation. What is the probability that (a) both balls are red (b) first ball is red (c) both balls are of same colors (d) both balls are of different colors (e) first ball is red and second ball is blue (f) first ball is red or blue Q. 11 Missile A has 45% chance of hitting target. Missile B has 55% chance of hitting a target. What is the probability that (i) both miss the target. (ii) at least one will hit the target. (iii) exactly one will hit the target. Q. 12 A politician from D party speaks truth 65% of times; another politician from rival party speaks truth 75% of times. Both politicians were asked about their personal love affair with their own office secretary, what is the probability that (i) both lie the actual fact . (ii) at least one will speak truth. (iii) exactly one speaks the truth. (iv) both speak the truth. Q.13 The question, “Do you drink alcohol?” was asked to 220 people. Results are shown in the table. . Yes No Total Male 48 82 Female 24 66 Total (a) What is the probability of a randomly selected individual being a male also drinks? (b) What is the probability of a randomly selected individual being a female? (c) What is the probability that a randomly selected individual drinks? (d) A person is selected at random and if the person is female, what is the probability that she drinks? (e) What is the probability that a randomly selected alcoholic person is a male? Q.14 A professor, Dr. Drakula, taught courses that included statements from across the five colleges abbreviated as AH, AS, BA, ED and EN. He taught at Texas A&M University – Kingsville (TAMUK) during the span of five academic years AY09 to AY13. The following table shows the total number of graduates during AY09 to AY13. One day, he was running late to his class. He was so focused on the class that he did not stop for a red light. As soon as he crossed through the intersection, a police officer Asked him to stop. ( a ) It is turned out that the police officer was TAMUK graduate during the past five years. What is the probability that the Police Officer was from ED College? ( b ) What is the probability that the Police Officer graduated in the academic year of 2011? ( c ) If the traffic officer graduated from TAMUK in the academic year of 2011(AY11). What is the conditional probability that he graduated from the ED college? ( d ) Are the events the academic year “AY 11” and the college of Education “ED” independent? Yes or no , why? Discrete Distribution Q.15 Find k and probability for X=2 and X=4. X 1 2 3 4 5 P(X=x) 0.1 3k 0.2 2k 0.2 (Hint: First find k, and then plug in) Also, calculate the expected value of X, E(X) and variance V(X). A game plan is derived based on above table, a player wins $5 if he can blindly choose 3 and loses $1 if he chooses other numbers.What is his expected win or loss per game? If he plays this game for 20 times, what is total win or lose? Q.16 Find k. X 3 4 5 6 7 P(X=x) k 2k 2k k 2k (Hint: First find k, and then plug in) Also, calculate the expected value of X, E(X) and variance V(X). A game plan is derived based on above table, a player wins $5 if he can blindly choose 3 and loses $1 if he chooses other numbers. What is his expected win or loss per game? If he plays this game for 20 times, what is total win or lose? Binomial Distribution: Q.17 (a) Hospital records show that of patients suffering from a certain disease, 75% die of it. What is the probability that of 6 randomly selected patients, 4 will recover? (b) A (blindfolded) marksman finds that on the average he hits the target 4 times out of 5. If he fires 4 shots, what is the probability of (i) more than 2 hits? (ii) at least 3 misses? (c) which of the following are binomial experiments? Explain the reason. i. Telephone surveying a group of 200 people to ask if they voted for George Bush. ii. Counting the average number of dogs seen at a veterinarian’s office daily. iii. You take a survey of 50 traffic lights in a certain city, at 3 p.m., recording whether the light was red, green, or yellow at that time. iv. You are at a fair, playing “pop the balloon” with 6 darts. There are 20 balloons. 10 of the balloons have a ticket inside that say “win,” and 10 have a ticket that says “lose.” Normal Distribution Q.18 Use standard normal distribution table to find the following probabilities: (a) P(Z<2.5) (b) P(Z< -1.3) (c) P(Z>0.12) (d) P(Z> -2.15) (e) P(0.11 ?)=0.87 (d) P(Z> ?)=0.34 Q.20. The length of life of certain type of light bulb is normally distributed with mean=220hrs and standard deviation=20hrs. (a) Define a random variable, X A light bulb is randomly selected, what is the probability that (b) it will last will last more than 207 hrs. ? (c) it will last less than 214 hrs. (d) it will last in between 199 to 207 hrs. Q.21. The length of life of an instrument produced by a machine has a normal distribution with a mean of 22 months and standard deviation of 4 months. Find the probability that an instrument produced by this machine will last (a) less than 10 months. (b) more than 28 months (c) between 10 and 28 months. Distribution of sample mean and Central Limit Theorem (CLT) Q.22 It is assumed that weight of teenage student is normally distributed with mean=140 lbs. and standard deviation =15 lbs. A simple random sample of 40 teenage students is taken and sample mean is calculated. If several such samples of same size are taken (i) what could be the mean of all sample means. (ii) what could be the standard deviation of all sample means. (iii) will the distribution of sample means be normal ? (iv) What is CLT? Write down the distribution of sample mean in the form of ~ ( , ) 2 n X N . Q.23 The time it takes students in a cooking school to learn to prepare seafood gumbo is a random variable with a normal distribution where the average is 3.2 hours and a standard deviation of 1.8 hours. A sample of 40 students was investigated. What is the distribution of sample mean (express in numbers)? Hypothesis Testing Q.24 The NCHS reported that the mean total cholesterol level in 2002 for all adults was 203 with standard deviation of 37. Total cholesterol levels in participants who attended the seventh examination of the Offspring in the Framingham Heart Study are summarized as follows: n=3,00, =200.3. Is there statistical evidence of a difference in mean cholesterol levels in the Framingham Offspring (means does the result form current examination differs from 2002 report)?? (Follow the steps below to reach the conclusion) (i) Define null and alternate hypothesis (Also write what is , and x in words at the beginning) (ii) Identify the significance level , and check whether it is one sided or two sided test. (iii) Calculate test statistics, Z. (iv) Use standard normal table to find the p-value and state whether you reject or accept (fail to reject) the null hypothesis. (v) what is the critical value, do you reject or accept the H0. (vi) Write down the conclusion based on part (iv). Q.25 A sample of 145 boxes of Kellogg’s Raisin Bran contain in average 1.95 scoops of raisins. It is known from past experiments that the standard deviation for the number of scoops of raisins is 0.25. The manufacturer of Kellogg’s Raisin Bran claimed that in average their product contains more than 2 scoops of raisins, do you reject or accept the manufacturers claim (follow all five steps)? Q.26 It is assumed that the mean systolic blood pressure is μ = 120 mm Hg. In the Honolulu Heart Study, a sample of n = 100 people had an average systolic blood pressure of 130.1 mm Hg. The standard deviation from the population is 21.21 mm Hg. Is the group significantly different (with respect to systolic blood pressure!) from the regular population? Use 10% level of significance. Q.27 A CEO claims that at least 80 percent of the company’s 1,000,000 customers are very satisfied. Again, 100 customers are surveyed using simple random sampling. The result: 73 percent are very satisfied. Based on these results, should we accept or reject the CEO’s hypothesis? Assume a significance level of 0.05. Q.28 True/False questions (These questions are collected from previous HW, review and exam problems, see the previous solutions for answers) (a) Total sum of probability can exceed 1. (b) If you throw a die, getting 2 or any even number are independent events. (c) If you roll a die for 20 times, the probability of getting 5 in 15th roll is 20 15 . (d) A student is taking a 5 question True-False quiz but he has not been doing any work in the course and does not know the material so he randomly guesses at all the answers. Probability that he gets the first question right is 2 1 . (e) Typing in laptop and writing emails using the same laptop are independent events. (f) Normal distribution is right skewed. (g) Mean is more robust to outliers. So mean is used for data with extreme values. (h) It is possible to have no mode in the data. (i) Standard normal variable, Z has some unit. (j) Only two parameters are required to describe the entire normal distribution. (k) Mean of standard normal variable, Z is 1. (l) If p-value of more than level of significance (alpha), we reject the H0. (m) Very small p-value indicates rejection of H0. (n) H0 always contains equality sign. (o) CLT indicates that distribution of sample mean can be anything, not just normal. (p) Sample mean is always equal to population mean. (q) Variance of sample mean is less than population mean. (r) Variance of sample mean does not depend on sample size. (s) Mr. A has cancer but a medical doctor diagnosed him as “no cancer”. It is a type I error. (t) Level of significance is probability of making type II error. (u) Type II error can be controlled. (v) Type I error is more serious than type II error. (w) Type I and Type II errors are based on null hypothesis. Q.29 Type I and Type II Errors : Make statements about Type I (False Positive) and Type II errors (False Negative). (a) The Alpha-Fetoprotein (AFP) Test has both Type I and Type II error possibilities. This test screens the mother’s blood during pregnancy for AFP and determines risk. Abnormally high or low levels may indicate Down syndrome. (Hint: Take actual status as down syndrome or not) Ho: patient is healthy Ha: patient is unhealthy (b) The mechanic inspects the brake pads for the minimum allowable thickness. Ho: Vehicles breaks meet the standard for the minimum allowable thickness. Ha: Vehicles brakes do not meet the standard for the minimum allowable thickness. (c) Celiac disease is one of the diseases which can be misdiagnosed or have less diagnosis. Following table shows the actual celiac patients and their diagnosis status by medical doctors: Actual Status Yes No Diagnosed as celiac Yes 85 5 No 25 105 I. Calculate the probability of making type I and type II error rates. II. Calculate the power of the test. (Power of the test= 1- P(type II error) Answers: USEFUL FORMULAE: Descriptive Statistics Possible Outliers, any value beyond the range of Q 1.5( ) and Q 1.5( ) Range = Maximum value -Minimum value 100 where 1 ( ) (Preferred) 1 and , n fx x For data with repeats, 1 ( ) (Preferred ) OR 1 and n x x For data without repeats, 1 3 1 3 3 1 2 2 2 2 2 2 2 2 2 2 Q Q Q Q x s CV n f n f x x OR s n fx nx s n x x s n x nx s Discrete Distribution ( ) ( ) ( ) ( ) { ( )} ( ) ( ) 2 2 2 2 E X x P X x V X E X E X E X xP X x Binomial Distribution Probability mass function, P(X=x)= x n x n x C p q for x=0,1,2,…,n. E(X)=np, Var(X)=npq Hypothesis Testing based on Normal Distribution X std X mean Z Standard Normal Variable, Probability Bayes Rule, ( ) ( and ) ( ) ( ) ( | ) P B P A B P B P A B P A B Central Limit Theorem For large n (n>30), ~ ( , ) 2 n X N and ˆ ~ ( , ) n pq p N p For hypothesis testing of μ, σ known n x Z For hypothesis testing of p n pq p p Z ˆ ANSWERS: Q.1 (a) 14.286 (b) 14 (c) none (d) 10.24 (e) 22.40 Q.2 (a) 15.125 (b) 15.5 (c) No (d) 10.98 (e) 21.9 (f) English Q.3 (a) 18.6 (b)19 (c) 16, 21, and 25 (d) 15, 22 (f) slightly left (g) 7 (h) no outliers (i) increase (j) same Q.4 (a) 0.41 (b) 20 (c)14, 17, 20, 21,25 (d) 16.5, 25 (f) slightly right (g) 8.5 (h) no (i) increase (j) same Q.5 (a)56.57 (b) 22.26 (c) 8.34 Q.6 (a) 21 (b) 38.57 (c) 29.57 Q.7 (a) 410 (b) 1200 Q.8 (a)3 (b) 0.65 Q.9 (a) 0.082 (b) 0.29 (c)0.34 (d) 0.66 (e)0.10 (f) 0.64 Q.10 (a) 0.038 (b)0.23 (c) 0.71 (d) 0.29 (e)0.096 (f) 0.62 Q.11 (i)0.248 (ii)0.752 (iii)0.505 Q.12 (i)0.0875 (ii)0.913 (iii)0.425 (iii)0.488 Q.13 (a)0.22 (b)0.41 (c)0.33 (d)0.27 (e) 0.67 Q.14 (a) 0.13 (b) 0.18 (c)0.12 Q.15 E(X)=3.1 , V(X)=1.69, $0.2 per game, $ 4 win. Q.16 E(X)=5.125, V(X)=1.86, $0.25 loss per game, $5 loss. Q.17 (a)0.201 (b) 0.819, 0.027 Q.18 (a)0.9938 (b)0.0968 (c)0.452 (d)0.984 (e) 0.0433 (f)0.2353 Q.19 (a) -0.25 (b)0.71 (c) -1.13 (d)0.41 Q.20 (b) 0.7422 (c) 0.3821 (d) 0.1109 Q.21 (a)0.0014 (b) 0.0668 (c) 0.9318 Q.22 (a) 140 (b)2.37 Q.24 Z=-1.26, Accept null. Q.25 Z=-2.41, accept null Q.26 Z=4.76, reject H0 Q.27 Z=-1.75, reject H0 Q.28 F, F, F, T , F, F, F, T, F, T, F, F, T, T, F, F, T, F, T, F, F, T, T Q.29 (c)0.113 , 0.022 , 0.977 (or 98%)

No expert has answered this question yet. You can browse … Read More...