Welcome back for lecture 6 which
is all about inference for means
Let's start with a motivating example.
Psychologists report that high school and college
students are the most pathologically sleep deprived
segment of the population.
They say that college students require an average of
9.25 hours of sleep per night in order to be fully alert.
They randomly sampled 25 students from a small
school in the northeastern United States
and they asked them how much
they slept the previous night.
The data are given in the following table:
These are in hours.
The goal is to estimate the mean
amounts slept by college students
and to determine whether it is less than
the minimum recommended amount of 7 hours.
First thing we wanna do is make a picture.
So here's a histogram of the sleep data.
So we have a few that have a few hours
of sleep and then a lot of 6, 7s and 8s
We have a central limit theorem
for means which we've seen before
We've also seen something of a central
limit theorem for proportions
So we can do a similar thing here.
Suppose we have a random sample of size
n from a population that has mean mu
and standard deviation sigma.
Suppose that the sample
mean is denoted by y bar.
Then regardless of the population distribution,
if the sample size is sufficiently large,
then the distribution of the sample
mean is approximately normal
with mean mu and standard deviation
sigma over the square root of n
The larger the sample size, the more closely
the normal distribution approximates
the sampling distribution for the mean.
We can run into issues with the central
limit theorem mainly, when can we use it?
Well the main issue with this version
of the central limit theorem
is that we do not know the
population standard deviation
In statistics, if we don't know the value
of a quantity, we have to estimate it.
If we estimate the standard deviation, can
we still use the normal distribution?
Well the answer is no, we can't
quite use the normal distribution.
So what do we do instead?
Well we use what's called
so we have to estimate
the standard deviation
and the natural estimate of the standard deviation is
the sample standard deviation which we'll note s
Recall that if the standard deviation
of a population were known,
then the statistic z equals y bar minus mu divided
by sigma over the square root of the sample size
is the normal distribution
with mean 0 and variance 1
If we replace sigma by s then we
get a statistic that we'll call t
and it's equal to y bar minus mu
divided by s over the square root of n
And this has what's known as the t-distribution
with n minus 1 degrees of freedom
This holds provided that the data
come from a normal distribution
So here's a picture of what a t-distribution looks like
and now we're gonna look at some of the properties.
It looks a lot like the normal distribution
with mean zero and standard deviation 1
However the tails of this distribution are a little bit
thicker than they are for the normal distribution,
it has a sharper peak than
does the normal distribution
and as the degrees of freedom increase,
the t-distribution looks more and more
like the normal distribution with
mean 0 and standard deviation 1.
If we wanna create a confidence interval for the population
mean, we're gonna need a statistic that has a t-distribution
So how are we going to do it?
Well let's recall that the test
statistic t equals y bar minus mu
divided by s over the square root of n has a
t-distribution with n minus 1 degrees of freedom
which from here on out, we're gonna
abbreviate t subscript n minus 1
Let's let the standard error of y bar
be equal to s over the square root of n
Then the 100 minus 1 minus the alpha percent
confidence interval for the population mean,
is given by this formula down here, the sample mean
plus or minus critical t times the standard error.
So it looks a lot like we've done wih the z-intervals
before but now we're replacing the z with a t
The t quantity depends on the confidence
level and the degrees of freedom.
So here's the table of the t distribution and
this is where we can find critical values.
So let's look at how to use it.
For a 100 times 1 minus alpha percent confidence interval,
let's find the alpha over to in the top margin of the table
so for example for a 95% confidence interval, alpha is
.05 so we're gonna look in the top of the table for .025
Next, look at the left hand margin and
find the approporiate degrees of freedom.
If you can't find the exact degrees of freedom,
then choose the closest value that is smaller
than the degrees of freedom
that you actually have.
So for example for a 95% confidence interval with
a sample size of 25, there are 24 degrees of freedom.
We've looked in the table, we see that the
corresponding critical value is 2.060.
In order to use the one-sample t interval for the
mean, we need to have three conditions satisified
First, we need the data to be random,
they have to come from a random sample.
Second, a 10% condition
The sample size has to be smaller
than 10% of the population
Third, nearly normal.
The data have to come from a distribution that appears to
be unimodal and symmetric with no outliers or strong skew.
For very small sample sizes, say maybe smaller than 15, the
histogram should look like the normal distribution pretty closely.
For moderate sample sizes,
perhaps between 15 and 40,
we can get away with using the t methods if the
data are unimodal and reasonably symmetric,
slight skew's okay, we can handle that.
For large sample sizes or
sample sizes bigger than 40,
the methods are pretty safe unless
the histogram are severely skewed.
So let's apply the t-methods to
get a 95% confidence interval
for the mean number of hours
that the college student slept.
So we'll go step by step, let's
first check the conditions.
First, the randomization condition.
The data are assumed to be from a random sample
as stated in the problem so we're good there.
Second, the 10% condition.
25 is way less than 10% of the population of
college students so we're good there.
The nearly normal condition,
The histogram indicated a unimodal
distribution with a a slight skew to the left
but it's not much of a concern
because the sample size is 25
So it fits in that 15-40 range where
we can deal with a little bit of skew.
From the data, we can calculate the
mean and the standard deviation,
so we do that and we get y bar
equals 6.64 s equals 1.0755
So we're gonna use this information
to create a 95% confidence interval
for the mean number of hours that
college students got that night
So here's the interval.
We find that the standard error of y bar is
1.0755 divided by the square root of 25 or 0.2151
We know that from the example of using the t-table
that if we want to find a 95% confidence interval,
and we have 24 degrees of freedom,
then our critical t-value is 2.060
Then our interval is y bar plus or minus
our critical t times the standard error
or 6.64 plus or minus 2.060 times 0.02151 which
gives us an interval of 6.1969 up to 7.0831 hours
So what does that tell us?
what that tells is that we're 95% confident
that the average college student sleeps
between 6.1969 and 7.0831 hours per night
Now let's look at one
sample t-test for the mean
We already know enough about
hypothesis testing to construct it.
So we just simply gonna
outline the steps here.
So the first step as it's always
been - state the hypothesis.
Null hypothesis: population means
equal to some hypothesized value
Alternative: population mean less
than, population mean greater than
or population mean not equal
to that hypothesized value.
Two, check conditions.
The randomization condition, the 10%
condition and the nearly normal condition
These conditions are the same as they
were for the confidence interval
Third, find the test statistic.
Use t equals y bar minus the hypothesized value of the population
mean divided by the standard error of the sample mean.
This has a t n minus 1 n distribution if
the null hypothesis is true
Four, use the table to find the
critical t value for the test
And five, state the
decision and the conclusion
So let's do an example
We're gonna test the hypothesis that the average
college student sleeps 7 hours per night
against the alternative hypothesis that the average
college student sleeps less than 7 hours per night.
Let's let mu be the mean number of hours
a college student sleeps each night,
we're gonna test it at
5% signficance level.
So here's the test, our hypothesis.
Null hypothesis: mu equals 7
alternative hypothesis: mu less than 7.
We checked these conditions when we
constructed the confidence interval
and since they're the same conditions for
the test as they are for the interval,
we know that they are satisfied so we
can carry on with the t-test procedure
The test statistic, t equals 6.64 minus 7
divided by the standard error which is 0.2151
so we get a t-value of minus 1.6736
Now let's look at the critical
regions for our t-test
For a level alpha test, the rejection regions are dependent
on the alternative hypothesis and they're as follows:
If the alternative is mu1 less than mu0, then we will reject
if our t-statistic is less than minus t and minus 1 alpha
For the alternative, mu greater then
mu 0, we reject the null hypothesis
if our test statistic takes a value
larger than t and minus 1 alpha
And if our alternative hypothesis
is mu not equal to mu0,
then we're gonna reject the null hypothesis
if the absolute value of our test statistic
is larger than t and minus 1 alpha over 2
and we can find these values in the table
as we did for the confidence interval.
So these values are again found by matching up
the degrees of freedom in the left hand margin,
to the probablities in the top margin and then
finding the value in the corresponding cell.
For our test, we're going to reject H0 if t is less than
minus t with 24 degrees of freedom and probability .05
which from the table we
can find is minus 1.711
Our test statistic is minus 1.6736
so since this is not less than the critical t of
minus 1.711, we don't reject the null hypothesis.
We conclude that there is no evidence at the 5% level that the
average college student sleeps fewer than 7 hours per night.
P-values are really hard
to find from the table.
You can get ranges of p-values but we're not gonna do that,
we're gonna rely on rejection regions here for t-test
Sometimes, we want a particular margin
of error for confidence interval
so we need to find a sample size that's going
to accomplish that given the confidence level.
So let's recall the one-sample t-interval
The margin of error is given by t*
n minus 1 times the standard error
or t*n minus 1 times s
over the square root of n
as we did before, if we want to attain a particular
margin of error for a given level of confidence,
we'll set the margin of error equal to the one
that we want and then solve for the sample size,
we end up with the following formula:
the desired sample size is equal to t*n minus 1 times the
sample standard deviation divided by the margin of error
that whole thing get squared
In our example, in order to attain
the margin of error of .05,
we need a sample size of n equals 2.060 times
1.0755 divided by .05, we square that whole thing
and we need a sample size of 1963.43
In order to be conservative, just like we did before, we
round up to 1964 so what we need is 1964 observations
So what can go wrong with
inference for means?
Well let's be sure to avoid
the following things:
Don't confuse means and proportions.
We wanna beware of data that have
multiple modes and strong skew
because those type of data clearly
not from a normal distribution
We wanna watch out for outliers
and we wanna watch out for bias as well.
We need to make sure that our observations are
randomized otherwise, we can't use the t-test procedure.
And finally, we wanna interpret our
confidence intervals correctly.
So in this lecture, what have we done?
Well, we've switched gears
from proportions to means
and we learned about how to construct
confidence intervals for means,
how to interpret confidence
intervals for means.
We also learned how to carry out
the one-sample t-test for a mean
and that we discussed the things that can go
wrong and the things things that we wanna avoid.
So this is the end of lecture 6 and I look
forward to seeing you again for lecture 7.