# Confidence Intervals for Proportions

My Notes
• Required.
Learning Material 2
• PDF
Slides Statistics pt2 Confidence Intervals for Proportions.pdf
• PDF
Report mistake
Transcript

00:01 Welcome back for lecture 2.

00:03 What we're gonne discuss confidence intervals for proportions.

00:10 As we discussed previously, if we have several samples of the same size and computed proportions of successes in each sample, we're going to get several different answers.

00:19 So this is why we have sampling distributions but the question is, which answer is right? And the answer to that is, probably none of them.

00:28 So what do we do? Instead of using one single estimate to summarize the entire populaiton, perhaps we could give a range of reasonable values for the population proportion.

00:39 We can construct this interval in such a way that we are guaranteed a certain probability that the interval captures the true value of the population parameter.

00:47 This interval is known as a confidence interval.

00:50 The primary benefit to using an interval instead a single estimate is that we no longer have to rely on that one single value to estimate the population proportion We now have a range of values that are reasonable given our data.

01:02 So how do we create this confidence interval? Well let's first recall the sampling distribution of p hat - the sample proportion.

01:09 Suppose the p is the population proportion, and that n is the sample size.

01:14 Provided the success/failure condition is satisfied and all of our observations are independent, we know the sampling distribution of p hat.

01:22 The distribution is normal, with mean p and standard deviation square root of p times 1 minus p over n So let's look at an example.

01:32 Let's go back to the Belief in Evolution example.

01:34 We asked a hundred randomly selected people if they believe in evolution.

01:38 In this setting, we don't know the population proportion.

01:41 47 of the people in our sample, say that they do believe in evolution.

01:45 We know our sample proportion, p hat equals 0.47 In order to create a confidence interval for p - the true proportion of people who believe in evolution we need the sampling distribution of p hat.

01:59 So what's the problem? Remember that the sampling distribution depends on the population proportion.

02:06 We need to know p to get the standard deviation.

02:10 But we don't know it, so how do we find the standard deviation of p hat? The answer is quite simple, we don't.

02:17 We use the estimate p hat to find the quantity known as the standard error of p hat.

02:23 So instead of using p in the standard deviation calculation, we simply find the standard error of p hat by plugging in p hat for p.

02:31 So we have standard error of the sample proportion is the square root of p hat times 1 minus p hat over the sample size.

02:40 So the question is, what about the mean? What is it that we're trying to estimate? It turns out that we don't need it in confidence interval so that's a cool thing.

02:48 In this example, we get the standard error of the sample proportion is the square root of .47 times .53 over 100 which comes out as 0.0499 So what do we know? Well we change the success/failure condition just a bit, and now we checked that n p hat is at least 10 and n 1 minus p hat is at least 10 Here we have the n p hat is 100 times 0.47 or 47 and that n times 1 minus p hat is 100 times 0.53 or 53.

03:21 so the succes/failure condition here is satisfied.

03:24 This tells us that the sampling distribution of p hat is approximately normal with mean p and standard error 0.0499 This tells us that about 68% of our sample of size 100 will have a sample proportion within one standard error .0499 of the population proportion This tells us again by the empirical rule that about 95% of all of our sample of size 100 will have a sample proportion within 2 standard errors or a 0.0998 of the population poroportion.

03:58 So how do we construct the interval? Well we're trying to capture the population proportion.

04:03 So from the view of the sample proportion, we know that there's a 95% chance that the population proportion is no more than 2 standard deviations away from the sample proportion.

04:14 We can use this to our advantage and we do this by simply adding and subtracting from p hat the value 2 times the standard error of p hat and we have the endpoints of an interval that has a 95% chance of capturing the true population proportion.

04:29 So in our example, we'll have p hat plus or minus 2 times 0.0499 which gives us 0.47 plus or minus 0.0998 or an interval from 0.3702 to 0.05698 We have to be very cautious here.

04:48 Even if the interval does capture the population proportion, we still dont know the value of the popukation proportion We can't even be sure that the interval contains the population proportion.

05:00 So what can we say about the population proportion? Well let's start with the things that we can't apccurately say based on the confidence interval What we cannot say is that 47 percent of all US adults believe in evolution.

05:14 The sample proportion is almost certainly not equal to the poputaion proportion.

05:19 It is probably true that 47% of all US adults believe in evolution.

05:24 Again we can't say that, it's probably not true because the sample proportion is very unlikely to be equal to the population proportion We also can't say the following: We don't know exactly what the proportion of US adults is that believe in evolution but we know that it is between 37.02% and 56.9%8 No we don't.

05:45 We can't be sure that our interval contains the true population proportion.

05:49 So what can we say? Well one thing that we can say is that we are 95% confident that between 37.02% and 56.98% of US adults believe in evolution This is a a statement about that describes our confidence interval and this is the best we can do.

06:08 This particular interval is known as a one-proportion z-interval.

06:12 We'll see several other types of confidence intervals a little bit later on in the course.

06:17 What do we mean by confidence? Well what we mean is confidence in the process and not necessarily the result.

06:24 So formally when we say 95% confidence, we don't refer to a 95% chance that our interval contains the true population proportion.

06:34 The population proportion is a fixed quantity and it's either in interval or it's not.

06:39 but we don't know the answer to the question What we mean is that 95% of samples of this size will produce confidence intervals that capture the true population proportion.

06:50 So what we often say is we are 95% confident that the true proportion lies in our interval.

06:56 The uncertainty comes in whether the particular sample we have is one of the succesful ones or one of the 5% that don't produce the interval that captures the true population proportion.

07:09 So what we can envision is for each sample, let's just draw a vertical line where the population proportion is.

07:15 And then we take a whole bunch of samples and we calculate a 95% confidence interval based on each sample and then lay the interval horizontally.

07:24 So we have this vertical line and we might have one interval that's right around the line, one interval that has the line in it but barely, we might have one way over on one side that doesn't have line in it, that would be one of the unsuccesful ones.

07:39 So basically, all the intervals that have the line going through with it at any point with will be the succesful ones and the intervals that don't have the line going through it will be the unsuccesful ones.

07:49 What we would expect over the long run would be for 5% of intervals we create to be unsuccesful, in other words, there are 5% of our intervals don't have the line going through them Let's look at margin of error and the trade off between confidence and precision.

08:08 The margin of error is simply the halfway of our confidence interval It's the extent of the interval on either side of the sample proportion If we want a higher a level of confidence, we need a larger margin of error Let's think about archery.

08:22 If you have a big target, you're more confident that you're going to hit that than you are with a small target.

08:28 It's the same idea with the confidence interval, you are more confident that a bigger interval is going to contain the true population proportion than you are for a small interval.

08:38 So as a result, a smaller margin of error is associated with less confidence.

08:43 SO ther eis a trade-off between confidence and precision in the sense that higher confidence means less precision.

08:50 So how do we change the confidence level? Well, we find critical values.

08:56 In order to change the confidence level, what we need to do is change the number of standard errors we want to extend the interval away from the sample proportion.

09:04 There's a number of standard errors that's known as the critical value.

09:08 So how do we find them? Once you've selected a confidence level, you can use the z-table to find the critical value which we're gonna denote as z star (z*) For 95% confidence interval, the precise critical value is z* equals 1.96 For a 90% confidence interval, the precise critical value is 1.645 How do we come up with these? Well we just look in the normal table So it's usually normal distribution to find critical values.

09:38 In finding the critical values for a 95% confidence interval, the aim is to find two values between which lie 95% of the values This would mean that there's 2.5% left out in each tail so using the z-table you'd look up the probability 0.9750, why? Well the z- table gives you the probablity of the normal (0,1) random variable takes the value less than or equal to z* So if the probablity that the normal (0,1) random variable is greater than or equal to z* 0.25, then the probability that that normal 01 random variable is less than or equal to z* is 0.975 So when we look at the body of the table, we find .975 and we find that z* is equal to 1.96 So here's a picture of what the critical region looks like for the normal distribution for this confidence interval We just take the middle 95% of the normal distribution We do a similar thing for a 90% confidence interval where we just look at the middle 90% of the normal distribution.

10:41 In order to use the one proportion z interval, we have to have four important conditions that need to be satisfied in order for the process to work well.

10:49 First of all, we need all of our trials to be independence of each other, this is the independent assumption Second, we need randomization.

10:57 In other words, the data need to be sampled or generated at random.

11:01 This can help ensure independence.

11:03 The 10% condition.

11:05 The sample size shouldn't be greater than 10% of the population.

11:09 And finally, number four - the sucess/failure condition.

11:13 where we observe more than 10 successes and more than 10 failures.

11:17 In the evolution example, all these conditions are satisfied since the adults were chosen randomly, and 100 is far less than 10% of all US adults.

11:26 We verify the success/failure condition directly How do we choose a sample size? Remember that we need to beat the trade off between confidence and precision.

11:36 and there's only one way to increase confidence while maintaining the same level of precision and that's to choose a larger sample size So maybe we want to choose a sample size that gives us a certain confidence level with a specified precision.

11:49 The margin of error is given by: the critical z times the square root of p hat 1 minus p hat over n So we can use algebra to find the desired sample size needed to obtain a particular margin of error This is typically done before any analysis is carried out so that at that point, the sample proportion is unknown.

12:08 In order to be conservative, we need to have a margin of error as large as possible.

12:13 This is done by substituting .5 for p hat.

12:17 And in doing this with the algebra, we get to the desired sample size is n equals z* times .05 over the margin of error, that whole thing, squared.

12:28 It is conservative because the margin of error is maximized when p dash a j t equals one-half equals .5 That way, the n we get will work in giving us the desired margin of error regardless of what the value of p dash a j t is This is a worst-case scenario approach.

12:50 So let's try one.

12:52 So let's go back to the evolution example and the goal here is to find the sample size as necessary to obtain a margin of error of 0.03 with 95% confidence.

13:02 So assume we haven't taken a sample yet.

13:05 So here's the calculation, z* is still 1.96, the margin of error is 0.03 so desired sample size is z* 1.96 times 0.5 all divided by the margin of error .03, that whole quantity squared.

13:26 So that gives us the desired sample size of 1067.11 But there's a problem, we can't sample .11 people, so what do we do? Well it's no big deal, all we do is in order to be conservatve, we round up to the next whole number so we would choose 1068 people So what can go wrong? Well here's a group of some of the common issues with confidence intervals.

13:50 and here's some pitfalls to avoid Do not suggest that the population proportion varies, it does not.

13:56 It's a fixed quantity, it doesn't move around.

13:59 Don't claim that other samples will agree with yours.

14:02 Don't be certain about the parameter.

14:04 In statistics, we're "confident".

14:06 Statistics is all about modelling uncertainties.

14:09 So in statistics we're not be certain about anything - we're confident.

14:14 Don't forget that the point is about estimating the population proportion.

14:19 Don't make confident statements about the sample proportion, you know that one.

14:23 There's no need to estimate something that you know.

14:26 Don't claim to know more than what your interval tells you and treat the whole interval equally.

14:31 Values near the center of the interval are not necessarily any more or any less plausible than values near the edges.

14:38 Beware of a margin of error that's too large to be useful An interval of 10% to 90% for instance, is not very helpful.

14:46 Watch out for biosampling techniques, and think about whether or not your trials are independent Alright so those are the common pitfalls of confidence intervals.

14:55 In this lecture. what we talked about was just constructing confidence intervals for proportions, So we described how we do it based on normal distribution, we described why we do it, we talked about the meaning of confidence And then we talked about some of the common issues with confidence intervals.

15:12 So this is the end of lecture 2 and I look forward to seeing you back for lecture 3.

The lecture Confidence Intervals for Proportions by David Spade, PhD is from the course Statistics Part 2. It contains the following chapters:

• Confidence Intervals for Proportions
• Constructing the Interval
• Margin of Error
• Assumptions and Conditions
• Pitfalls to Avoid

### Included Quiz Questions

1. We know that we have constructed an interval that captures the true population proportion
2. We no longer need to rely on a single value to estimate the population proportion
3. We are guaranteed a certain probability that the interval captures the true value of the population parameter
4. We have more confidence in a range than in a single value as an estimate of the population proportion
1. We are 95% confident that the true population proportion is between a and b
2. We know that the true population proportion is between a and b
3. We know that the true population proportion is ˆp
4. It is probably true that the population proportion is ˆp
1. A smaller margin of error is associated with a smaller confidence interval
2. A smaller margin of error is associated with a larger confidence interval
3. A larger margin of error is associated with smaller confidence interval
4. We can change the confidence level using the same sample size without affecting the margin of error
1. We need our sample size to be at least 10% of the population
2. We need to observe at least 10 successes
3. We need to observe at least 10 failures
4. We need our sample to be random
1. The confidence level can be increased while also decreasing the margin of error by increasing the sample size
2. The interval is based on the sample proportion, so any statements you can make based on the interval should be about the sample proportion
3. 95% confidence means that you are 95% certain that the population proportion lies in your interval
4. Values near the center of the interval are more plausible than values near the edges
1. 0.4; 0
2. 0.5; 0
3. 0.3; 0
4. 0.2; 0
5. 0.1; 0
1. 0.25
2. 0.5
3. 0
4. 0.3
5. 0.8

### Author of lecture Confidence Intervals for Proportions 