# Comparing two Proportions

My Notes
• Required.
Learning Material 2
• PDF
Slides Statistics pt2 Comparing two Proportions.pdf
• PDF
Report mistake
Transcript

00:01 Welcome back for lecture 5 in which we're gonna talk about how to compare two proportions.

00:08 Psychologists suggests that men take more risks than women.

00:12 They question the effect of men having a woman by their side in reducing the risk taking behaviors So for example, the examine seatbelt usage.

00:20 And the question is whether or not male drivers wear their seatbelts more frequently when there's a woman in the car with them.

00:27 So they randomly selected 4208 male drivers with female passengers and 2777 or 60% of them were wearing their seatbelt They also looked at a random sample of 2763 male drivers without women by their side and 1363 or 49.3% of them were wearing their seatbelts.

00:52 So the question is, does this suggest that male drivers are more likely to wear their seatbelts when a female is present in the car? So what should we look for? The first thing we need to do is we need to find the statistic to investigate the difference.

01:06 So we know that if x and y are indepedent random variables, then the variance of the difference between x and y is equal to the variance of x plus the variance of y This means that the standard deviation of x minus y is equal to the square root of the variance of x plus the variance of y.

01:23 But remember that this only applies to independent random variables.

01:27 So now we're dealing with two proportions.

01:30 Let's let p1 and p2 be the true proportions in the independent populations.

01:35 We're interested in the difference between the two, p1 minus p2 So how can we compare them? Let's look at the standard deviation of the difference of proportions So we look at the difference of sample proportions.

01:49 We let p hat 1 and p hat 2 be the corresponding sample proportions for samples of size n1 and n2.

01:57 And let's recall that the variance of p hat 1 is p1 times 1 minus p1 over n1 and that the variance of p hat 12 is p2 times 1 minus p2 over n2.

02:09 Since the samples are drawn independently, the sample proportions are independent.

02:15 Threfore, we can look at the variance of p hat 1 minus p hat 2 as the sum of the variances of the two sample proportions.

02:23 Therefore the standard deviation of p hat 1 minus p hat 2 is just the square root of p1 times 1 minus p1 over n1 plus p2 times 1 minus p2 over n2 Usually we don't know the true population proportion so we have to estimate them as we did before.

02:41 This means calculating a standard error as opposed to a standard deviation So we have that the standard error of p hat 1 minus p hat 2 is the square root of p hat 1 times 1 minus p hat 1 over n1 plus p hat 2 times 1 minus p hat 2 over n2 In order to form confidence intervals for the differnce in proportions, we have to have some conditions satisfied, and there are four of them.

03:06 First of all, we need the groups to be independent.

03:08 In other words, the two groups that we are comparing are independent of each other.

03:13 The samples from each population are collected independently.

03:17 Two, the independent/randomization condition - this is not the same thing as one.

03:22 Each groups observations also have to be independent or randomly selected.

03:27 The 10% condition.

03:29 As before, the sample should not be larger than 10% of their respective population sizes.

03:35 And four, the success/failure condition.

03:37 We observe at least 10 successes and 10 failures in each group.

03:42 If these conditions are satisfied, then we can form confidence intervals for the difference between the population proportions.

03:49 So let's think about how we do it While the interval looks similar to the one from before, we'll take p hat minus 1 minus p hat 2 this time, plus or minus z* times the standard error of p hat 1 minus p hat 2, so we're taking a statistic from a sample whose distribution we know, plus or minus z* times the standard error of that statistic.

04:09 This is something we've done before.

04:12 So writing it out in full, we'll have p hat 1 minus p hat 2 plus or minus z* times the square root of p hat 1 minus p hat 1 over n1 plus p hat 2 times 1 minus p hat 2 over n2 We can use this to investigate the question of whether men are less likely not to wear their seatbelt when a woman is in the car.

04:33 So let's do it, let's create a 95% confidence interval for the difference in the proportions.

04:38 Let p1 be the percentage of men that wear seatbelts when a woman is in the car and let p2 be the percentage of men that wear seatbelts when a woman is not in the car.

04:46 Let p hat 1 and p hat 2 be the corresponding sample proportions.

04:51 So first we have to check the conditions.

04:54 Do we have independent groups? Well the samples were taken independently so this condition is satisfied.

04:59 Do we have independence/randomization? The menwere randomly selected in each group so condition two is satisfied.

05:07 Condition three, the 10% condtion.

05:10 Both samples are a very small percentage of a large population so we're good on conditon three.

05:15 and the success/failure condition.

05:17 We observed 2777 successes in the first group and 1363 successes in the second group So, the success/failure condition is satisfied, we also observed a sufficient number of failures in both groups.

05:33 So since all the conditions are satisfied, we can use what we call the two-proportion z-interval So let's do the mechanics We observed p hat 1 equals 0.66, we observed n1 equals 4208, p hat 2 equals 0.493, and n2 equals 2763.

05:55 The critical value for a 95% confidence interval we've seen before, is z* equals 1.96 So using the formula for the confidence interval that we've seen before, we get .66 minus .493 plus or minus 1.96 times the square root of .66 times .34 over 4208 plus .493 times .507 over 2763 So this leads to an interval of .15501 to .17899 So what we're saying here is that we're 95% confident that the proportion of men who wear seatbelts when a woman is in the car is between .15501 and .17899, higher than the same proportion when no women are in the car.

06:43 Note that zero's not in the interval This is important.

06:47 What this indicates is if we were to conduct a hypothesis test of the null hypothesis p1 minus p2 equals 0 against the alternative hypothesis p1 minus p2 is not equal to 0, we would reject the H0 at the 5% significance level.

07:04 Let's look at hypothesis testing for a difference in poroportions and we'll motivate this one with an example as well.

07:10 In a national sleep foundation study, 995 people were asked whether they snore.

07:15 They were broken up by age category, we had 184 people under the age of 30, of which 26.1% of them snored.

07:23 We also had 811 in the over 30 age group of which 39.2% snored.

07:31 So the question is, is there a diffference in the percentage of people over and under 30 who snore? and we'll need a hypothesis test to answer this question.

07:39 So in this example, let's let p1 denote the percentage of people 30 or younger that snore and let's let p2 be the same percentage in the over 30 age group.

07:48 We set the null hypothesis H0 to be p1 minus p2 equals 0 So let's measure variability, which means calculating a standard deviation Under the null hypothesis, the difference in proportions is zero.

08:03 But we don't know the individual values so we need to estimate them.

08:06 We do this by what's called pooling the proportions and using them to calculate the standard error in the difference in the two sample proportions.

08:15 This pooled estimator is named p hat pooled and it's goven by n1 p hat 1 plus n2 p hat 2 over n1 plus n2 In other words, it's the number of successes in the first group plus the number of successes in the second group divided by the total number of observations.

08:35 Then we calcualate the standard error in the usual way.

08:38 The pooled standard error of the p hat 1 minus p hat 2 is given by the square root of p hat pooled 1 minus p hat pooled over n1 plus p hat pooled times 1 minus p hat pooled over n2 So in order to carry out the test, we need to satisfy the success/failure condition.

08:57 We rely on the pooled sample proportion now because we don't know the hypothesized population proportions individually In order to use the following procedure, what we need is n1 p hat pooled to be at least 10.

09:09 n1 times 1 minus p hat pooled is at least 10, n2 times p hat pooled is at least 10 and n2 times 1 minus p hat pooled is also at least 10.

09:20 If this condition holds along with the independence/randomization, the independent groups and the 10% condition, we can use a z-test.

09:28 The test statistic for this hypothesis test is a z-statistic and is given by p hat 1 minus p hat 2 minus 0 divided by the pooled standard error of p hat 1 minus p hat 2 And we use this statistic to calculate p-values.

09:43 So let's go back to the snoring example.

09:45 We wanna test the hypothesis p1 minus p2 is equal to 0 against the alternative that p1 minus p2 is not equal to 0 Alright, we've stated the hypothesis, so now we have to check the conditions.

09:58 First, the independent groups assumption.

09:59 This is reasonable, since it's unlikely that what happens in one groups affects what happens in the other group.

10:05 Secondly, the independence/randomization condition.

10:08 The individuals were randomly sampled in each group so we're okay here.

10:12 The 10% condition, the sample sizes are much less than 10% of the population size in each groups so we're okay with three.

10:20 For the success/failure condition, we need p hat pooled in order to do it.

10:24 So let's find it, P hat pooled is 184 times .261 plus 811 times .392 divided by 184 plus 811 which gives us a pooled sample proportion of .3677 We have n1 p hat pooled equals 67.67 n1 times 1 minus p hat pooled is 116.33 We have n2 times p hat pooled equal to 512. 74 and we have n2 times 1 minus p hat pooled equal to 298.26 so that all of these things are at least 10 so we have all the conditions satisfied and we can use the procedures described in the last slide.

11:11 So let's do the mechanics.

11:13 Why we need the standard error? So we have the pooled standard error of p hat 1 minus 1 p hat 2 using the formula stated before gives us the square root of .3677 times 1 minus .3677 over 184 plus .3677 times 1 minus .3677 over 811 gives us a pooled standard error of 0.0394 So the test statistic then is z equals p hat 1 minus p hat 2 divided by the pooled standard error, or .261 minus .392 over .0394 which gives us a z-statistic of minus 3.32 So the p value is 2 times the probability that a normal (0,1) random variable is less than or equal to minus 3.32 which is the same thing as 2 times the probability that a normal random variable takes a value that's as least as large as positive 3.32 So we find this by taking 2 times 1 minus the probability that a normal (0,1) random variable takes a value less than or equal to 3.32 which gives us 2 times 0.0045 or 0.0009 So what does that mean? Well that's a very small probability so if our null hypothesis were true, what we observed would be very unlikely.

12:34 So we reject the null hypothesis and we conclude that there's a difference between the prevalence of snoring between the two age groups.

12:40 So in the two-proportion z-test, what can we do wrong? Things that we wanna avoid: We don't want to use the two-sample proportion methods, when they're not.. when the samples are not independent.

12:52 we don't want to apply these methods if there's no randomization in each group, and we don't want to interpret a significant difference in proportions as evidence of cause.

13:01 Often proportions come from observational studies which we know can't be used to determine cause So what have we done in this lecture? Well we basically discussed how to compare proportions from two different populations.

13:14 So we talked about how to form confidence intervals for the difference between two population proportions and we talked about how to carry out a hypothesis test for the difference between two population proportions.

13:25 We did a couple of examples, one with the seatbelts and one with the snoring and then at the end we talked about the things that could go wrong.

13:31 So be sure to avoid the things that can go wrong and to follow all the mechanics that are described in the previous sections of the chapter.

13:38 and this is the end of lecture 5.

The lecture Comparing two Proportions by David Spade, PhD is from the course Statistics Part 2. It contains the following chapters:

• Comparing Two Proportions
• Assumptions and Conditions
• Hypothesis Testing
• Snoring Example
• Example: Mechanics and Conclusion

### Included Quiz Questions

1. We usually do not know the population proportions, so finding the standard deviation is not possible and we have to estimate it using the standard error
2. The standard deviation of the difference in the proportions is simply the difference between the standard deviations of the two proportions
3. The standard deviation of the difference in the proportions is simply the sum of the standard deviations of the proportions
4. The standard error of the difference in the proportions is the sum of the standard errors of the individual proportions
1. The groups we are comparing are linearly related to each other
2. Each group’s observations are independent or are randomly selected
3. We need to observe at least 10 successes and 10 failures in each group
4. The sample sizes should not be larger than 10% of the respective population sizes.
1. We are 95% confident that p1 - p2 is between a and b
2. We know that the difference between the population proportions falls between a and b
3. The value ˆ p 1 - ˆ p 2 is a more plausible value for the difference in the population proportions than are values near a or near b
4. We are 95% certain that p 1 - p 2 is between a and b
1. We do not know either population proportion under the null hypothesis, so we construct a pooled estimate based on the two sample proportions
2. The procedures described in this chapter can be used if the groups are not independent
3. The procedures described in this chapter can be used if the individuals in the study are not randomly sampled
4. We do not know either sample proportion under the null hypothesis, so we have to construct a pooled estimate based on the population proportions
1. Based on the data we observed, the null hypothesis is not a reasonable explanation of the behavior between the two populations
2. We are certain that there is a difference between the two population proportions
3. We are certain that there is no difference between the two population proportions
4. We have evidence that there is no difference between the two population proportions
1. 0.24;0.09
2. 0.24;0.08
3. 0.24;0.07
4. 0.24;0.06
5. 0.24;0.05
1. 0.55
2. 0.57
3. 0.49
4. 0.51
5. 0.53