Welcome back for lecture 3 in which we discuss
testing hypothesis about proportions.
So let's start with an example to motivate what
it is that we're trying to do in this lecture.
Suppose that in a particular
college level mathematics course,
it's known that 20% of the students are earning a grade
of D or F or they're just withdrawing from the course
The department of mathematics has
decided to redesign the course
in the hopes of decreasing the percentage
of students that get a D or F or withdraw.
After a year of the redesigned course, the department
randomly selects the records of 400 students
and find that in the sample, 17% of
them had a D or an F or they withdrew.
The question is, should the department
continue with the redesign,
conclude that the redesign is a success or is this
lower rate in the sample simply due to chance?
So to answer these kinds of questions, we
test hypothesis about statistical models.
So how do we set up hypothesis?
Well what we're doing is we're
setting up a probability model.
So in setting up the hypothesis, what we wanna do is
we wanna see if a change or an effect has taken place.
And in order to do that, we first assume that it hasn't
and then we let the data try to convince us otherwise.
We temporarily adapt this assumed model and
then evaluate it once we have the data.
This starting hypothesis is
called the null hypothesis
and it's called this because it's the
assumption that no change has occured.
In the redesign example, we wanna start by
assuming that the percentage of students
that withdrew or got a
D or an F is still 20%
So we write this in the following form :
we'll write H0:parameter
equals hypothesized value
So in this case, if p is the percentage of students with a
D or an F or who withdrew, they will write H0: p equals 0.2
We also have an alternative hypothesis,.
This is what we wanna decide if we
conclude the H0 is not plausible.
So we denote this H1 and what it does is it
contains the values of the population parameter
that we consider plausible if
we decide that H0 is not true.
In the redesign example, we're interested in
reducing the rate of these Fs and withdrawals.
So our alternative hypothesis then
will be written H1: p1 less than 0.2
Otherwise stated, the alternative hypothesis
represents what you would want to conclude
if you decide that the null
hypothesis is not true.
So how do we decide?
What makes us say that the
null hypothesis is false?
Well let's take a closer look.
If the new rate of these Fs and withdrawals was 2%,
we would probably say that the redesign has worked.
If the new rate were 19.9%, we might not be as inclined to say
that because the difference is easily attributable to chance.
Our goal is not to determine whether the sample proportion
differs from a hypothesized value of proportion,
it probably will.
But the question is, is it a statistically significant
difference from the population proportion?
If it is, we will conclude that the
null hypothesis is likely false.
So we need to satisfy some conditions
before we can perform a hypothesis test.
So when can we use the procedures
that we're about to describe?
Well, just like we've had in any other
thing we've known at proportions,
we need to verify the
So what we do is we let..
Let's let p0 denote the hypothesized
value of a population proportion
and let's let n denote the sample size.
Then what we need to have happened is we need n
times our hypothesizd proportion can be at least 10
and we need n times 1 minus our
hypothesized value to be at least 10.
If this condition holds, then we can approximate
the distribution of a sample proportion
with a normal distribution that has mean p0 and
standard deviation p0 times 1 minus p0 over n.
Why is this a standard deviation in this example and not
a standard error like it was for confidence intervals?
Well, here we're not estimating the population
proportion, we're assuming it has a particular value.
Standard error's a term that we reserve for
when we estimate the standard deviation.
Let's start with an example.
If the null hypothesis is true, if 20% of students
are getting D's, F's or withdrawing from the course,
before we can proceed we first need to
check the success/failure condition.
Our hypothesized value
of the proportion is 0.2
So we have, we sampled 400 students,
so we have n p0 equals 400 times .2
which is 80, which is much bigger than 10
and then we have n times 1 minus p0 equals 400
times 0.8 or 320 which is also much bigger than 10
So we're good on the
What that means is that our sample
proportion has a normal distribution
with mean p equals .2 and standard deviation
square root of p01 minus p0 over n
or square root of .2 times .8
over 400 which is 0.02
The question becomes then, is
the sample proportion of 0.17
rare given the assumption that
the population proportion is 0.2?
So we answer this question by finding the probability of observing
something as extreme or more extreme than what we observed.
which means as far or farther away from the hypothesized
proportion in the direction of the alternative hypothesis.
So let's try it.
If the true population proportion is 0.2, then we want
the probability that p hat is less than or equal to 0.17
and we can translate that to a z-score
and we get the probability that
z is less than or equal to .17 minus .2 divided
by .02 since .02 is our standard deviation
which is equal to the probability that
z is less than or equal to minus 1.5
Remember that z is a normal
(0,1) random variable.
So using the table we find that this probability is equal to the
probability that z is bigger than or equal to positive 1.5
or 1 minus the probability that z is
less than or equal to 1.5 which is 0.067
So what this means is that if the null hypothesis were true, then
our sample proportion will be .17 or smaller, 6.7% of the time.
It's up to the investigator to decide whether this is compelling
evidence to conclude that the redesign has been a success
We can view our hypothesis test
as kind of like a criminal trial,
and US criminal law, a defendant
is innocent until proven guilty.
So we can think of a null hypothesis
as a presumption of innocence.
The data are the evidence.
And the data are judged and evaluated in order to
determine whether the null hypothesis is false.
Jsut like the jury might decide whether the evidence against the
defendant were plausible if the defendant were actually innocent
At this point, it's up to the
jury to make the decision.
In hypothesis testing, the investigator
or the statistician is the jury
Juries don't give a verdict of
innocent in a criminal trial.
What they say is "not guilty"
and this is because they simply don't have enough
evidence to conclude that the defendant is guilty
So this is not an acceptance of the presumption of innocence
but it's not a rejection of that presumption either.
In hypothesis testing, we do a similar thing.
We don't accept the null hypothesis.
We just do not reject it, we basically
do the same thing and say "not guilty"
Let's look at p-values.
These kinda answer the question of whether
or not the data that we observed are weird.
So what is a p-value?
How do we use it?
Well the p-value tells us how likely the data that we
observe are if the null hypothesis were in fact true.
Formally, the p-value is the probability of seeing
data like what we saw or something even more extreme
if the null hypothesis were true.
It's a measure of how surprised we are to observe
the data that we did if the null hypothesis is true,
So we calculated a p-value in our example.
That p-value is .067
Small p-values give evidence
against the null hypothesis.
In other words, that means we're surprised to see the data
that we saw if the null hypothesis were actually true
So two possibilities arise, either the null
hypothesis is true and we saw something weird,
or the null hypothesis is false.
In statistics, we conclude the
latter when the p-value is small.
So let's formalize the
hypothesis testing procedure.
First off is to formulate the hypothesis.
We did this in the first example.
We said the null hypothesis was p equals 0.2 and we tested that
against the alternative hypothesis that p is less than .2
The second step is to decide what probability
model we're going to use to carry out the task.
In the example, we use the normal model
The test that we use is called
the one-proportion z-test
and in order to use this model, we need to make
sure that we have some conditions satisfied,
So let's look at the conditions.
The first is the randomization condition.
The observations have to be drawn at random.
In the redesign example, this
is stated in the problem.
The 10% condition.
As it's meant before, this means that the sample
size should not exceed 10% of the population size.
In the redesign example, we're going to assume that more
than 4000 students took the course over the last year.
And we have the success/failure condition
which we've already addressed.
We've verified that that condition
holds as we went through the example.
Step three is the mechanics phase.
This is where we actually
calculate the test statistic.
Different types of tests have
different test statistics.
Now for the one-proportion z-test,
the z-score is the test statistic.
In the example, the test statistic
was z equals minus 1.5
We'll encounter other test
statistics later on in the course.
But from the test statistic,
we then calculate the p-value
In the example, our p-value was 0.067
Next we make a conclusion.
In this step, we make a decision on whether
or not to reject the null hypothesis.
Once we've made this decision, the conclusion
needs to be stated in the context of the problem.
So in our example, if we decide to reject the
null hypothesis in the redesign example,
then what we would say is that we have
evidence that the redesign has been succesful
in reducing the percentage
of Ds, Fs and withdrawals.
Once this conclusion has been made,
the next step is to use the information to decide
how to proceed from a practical standpoint.
So do we continue with the
redesign or do we drop it?
In this example, The department will likely want
to continue with the redesign if they decide
to reject the null hypothesis.
When do you reject the null hypothesis?
Well typically we set what's called the significance
level, this is the threshold for p-values.
Typically, what we do is we set
a value alpha for a p-value
such that if a p-value is less than or equal to that
value of alpha, we reject the null hypothesis.
Otherwise, we don't reject the null hypothesis.
This threshold is what's known
as the significance level.
So the significance level should
be set before step 3 begins.
Before you do any of the mechanics, you
should set your significance level.
In our example, if our signficance level is 0.1, we would reject
the null hypothesis and conclude that the redesign has worked.
If the siginficance level were .05,
we would not reject the null hypothesis because this
threshold is smaller than the p-value that we observed
Setting the significance level before
any mechanics are carried out,
it's kinda to check on ourselves because
what we're doing is we're ensuring that
we're not setting a significance level in such a way that
we guarantee that the test works a certain way for us.
So when we talk about a hypothesis
test, what things do we report?
Well we report the decision
and we report the conclusion
The significance level we can report we might choose
not to, but the p-value is necessary to report.
And the reason for that is, there's no need for a
significance level because if the p-value's known,
it allows the reader or some other third
party to make his or her own decision
about whether or not tha data
support of a particular hypothesis.
So let's formally describe
how we compute p-values.
We have three possible alternative hypothesis.
We have the alternative hypothesis that p is less
than p0, p greater than p0 or p not equal to p0
And to compute these, we compute the p-value always
in the direction of the alternative hypothesis
So let's let z be a normal random variable
with mean zero and standard deviation 1.
If the alternative hypothesis is p less than
p0, then the p-value is the probability
that this normal random variable takes a value
less than or equal to what we've observed
We refer to this alternative as a left-tailed hypothesis
If the alternative hypothesis is p greater than p0,
then the p-value is computed by taking the probability
that this normal random variable takes a
value at least as large as what we observed.
and we call this alternative
a right-tailed hypothesis
If the alternative hypothesis is p not equal to p0,
in other words we're just looking for a difference,
the p-value is the probability that z is less than or equal
to the absolute negative absolute value of what we observed
plus the probability that z is at least as large
as the positive absolute value of what we observed.
Or we can just write that as 2 times the
probability that this normal random variable
is at least as large as the absolute value
of the test statistic we observed.
So we put some illustrations in
here on the next three slides.
Here's an illustration of what the left-tail
test looks like, the left-tail p-value.
Suppose our alternative
hypothesis is p less than p0,
and we observe a test
statistic z equals minus 1.75
then the p-value is the probability that a normal (0,1)
random variable takes a value no larger than minus 1.75
That value is 0.04006
and that probability shaded in in the graph.
Iif we do an upper tail test or a right-tail
test where the alternative is p greater than p0
and we observed a test
statistic value of 1.75
Then the p-value is the probability
that z is at least as large as 1.75
This is also 0.04006 and that probability in that region is
shaded in in the graph on the left hand side of the slide.
And finally for the two-tailed alternative, the alternative
hypothesis is p not equal to p0 and we observed z equals 1.75
So the p-value is 2 times the probability that the normal (0,1)
random variable takes a value at least as large as 1.75
and that probability is 0.08012
So we have the region shaded in in the graph so we have
the left-tailed side and the right-tailed shaded in
And those probabilities in
those regions add up to .08012
So we have a lot of things that could go wrong in hypothesis
testing so let's discuss some of the pitfalls to avoid.
Don't base your hypothesis off
what you see in the data.
Don't make your null hypothesis
the thing you want to show.
This is where your alternative
is supposed to be for.
Don't forget to check the conditions.
If your conditions aren't satisified, if you don't check
them and these conditions aren't actually satisifed,
then your hypothesis test
isn't going to work very well.
Do not accept the null hypothesis,
what we do is we fail to reject it.
If you fail to reject the null hypothesis, don't think that a
larger sample size will be more likely to lead to rejection
All samples are different, and a larger
one may or may not lead to a rejection.
So what did we do in this lecture?
Well we talked about how to carry out a
hypothesis test for population proportion.
We also described the conditions under
which this hypothesis test is appropriate
and we discussed some of the things that
can go wrong in hypothesis testing,
some mistakes that are commonly made.
So you avoid these mistakes
and follow the mechanics
and all the four steps as were
described in the previous slides.
And your hypothesis test will go well for
you and it will give you valid conclusions
This is the end of lecture 3 and I look
forward to seeing you all back for lecture 4.