Playlist

Inference for Means

by David Spade, PhD

My Notes
  • Required.
Save Cancel
    Learning Material 2
    • PDF
      Slides Statistics pt2 Inference for Means.pdf
    • PDF
      Download Lecture Overview
    Report mistake
    Transcript

    00:01 Welcome back for lecture 6 which is all about inference for means Let's start with a motivating example.

    00:08 Psychologists report that high school and college students are the most pathologically sleep deprived segment of the population.

    00:14 They say that college students require an average of 9.25 hours of sleep per night in order to be fully alert.

    00:21 They randomly sampled 25 students from a small school in the northeastern United States and they asked them how much they slept the previous night.

    00:29 The data are given in the following table: These are in hours.

    00:35 The goal is to estimate the mean amounts slept by college students and to determine whether it is less than the minimum recommended amount of 7 hours.

    00:43 First thing we wanna do is make a picture.

    00:46 So here's a histogram of the sleep data.

    00:49 So we have a few that have a few hours of sleep and then a lot of 6, 7s and 8s We have a central limit theorem for means which we've seen before We've also seen something of a central limit theorem for proportions So we can do a similar thing here.

    01:04 Suppose we have a random sample of size n from a population that has mean mu and standard deviation sigma.

    01:11 Suppose that the sample mean is denoted by y bar.

    01:14 Then regardless of the population distribution, if the sample size is sufficiently large, then the distribution of the sample mean is approximately normal with mean mu and standard deviation sigma over the square root of n The larger the sample size, the more closely the normal distribution approximates the sampling distribution for the mean.

    01:36 We can run into issues with the central limit theorem mainly, when can we use it? Well the main issue with this version of the central limit theorem is that we do not know the population standard deviation In statistics, if we don't know the value of a quantity, we have to estimate it.

    01:53 If we estimate the standard deviation, can we still use the normal distribution? Well the answer is no, we can't quite use the normal distribution.

    02:02 So what do we do instead? Well we use what's called the t-distribution, so we have to estimate the standard deviation and the natural estimate of the standard deviation is the sample standard deviation which we'll note s Recall that if the standard deviation of a population were known, then the statistic z equals y bar minus mu divided by sigma over the square root of the sample size is the normal distribution with mean 0 and variance 1 If we replace sigma by s then we get a statistic that we'll call t and it's equal to y bar minus mu divided by s over the square root of n And this has what's known as the t-distribution with n minus 1 degrees of freedom This holds provided that the data come from a normal distribution So here's a picture of what a t-distribution looks like and now we're gonna look at some of the properties.

    02:57 It looks a lot like the normal distribution with mean zero and standard deviation 1 However the tails of this distribution are a little bit thicker than they are for the normal distribution, it has a sharper peak than does the normal distribution and as the degrees of freedom increase, the t-distribution looks more and more like the normal distribution with mean 0 and standard deviation 1.

    03:21 If we wanna create a confidence interval for the population mean, we're gonna need a statistic that has a t-distribution So how are we going to do it? Well let's recall that the test statistic t equals y bar minus mu divided by s over the square root of n has a t-distribution with n minus 1 degrees of freedom which from here on out, we're gonna abbreviate t subscript n minus 1 Let's let the standard error of y bar be equal to s over the square root of n Then the 100 minus 1 minus the alpha percent confidence interval for the population mean, is given by this formula down here, the sample mean plus or minus critical t times the standard error.

    04:03 So it looks a lot like we've done wih the z-intervals before but now we're replacing the z with a t The t quantity depends on the confidence level and the degrees of freedom.

    04:14 So here's the table of the t distribution and this is where we can find critical values.

    04:19 So let's look at how to use it.

    04:21 For a 100 times 1 minus alpha percent confidence interval, let's find the alpha over to in the top margin of the table so for example for a 95% confidence interval, alpha is .05 so we're gonna look in the top of the table for .025 Next, look at the left hand margin and find the approporiate degrees of freedom.

    04:43 If you can't find the exact degrees of freedom, then choose the closest value that is smaller than the degrees of freedom that you actually have.

    04:51 So for example for a 95% confidence interval with a sample size of 25, there are 24 degrees of freedom.

    04:58 We've looked in the table, we see that the corresponding critical value is 2.060.

    05:05 In order to use the one-sample t interval for the mean, we need to have three conditions satisified First, we need the data to be random, they have to come from a random sample.

    05:15 Second, a 10% condition The sample size has to be smaller than 10% of the population Third, nearly normal.

    05:24 The data have to come from a distribution that appears to be unimodal and symmetric with no outliers or strong skew.

    05:31 For very small sample sizes, say maybe smaller than 15, the histogram should look like the normal distribution pretty closely.

    05:40 For moderate sample sizes, perhaps between 15 and 40, we can get away with using the t methods if the data are unimodal and reasonably symmetric, slight skew's okay, we can handle that.

    05:52 For large sample sizes or sample sizes bigger than 40, the methods are pretty safe unless the histogram are severely skewed.

    06:00 So let's apply the t-methods to get a 95% confidence interval for the mean number of hours that the college student slept.

    06:07 So we'll go step by step, let's first check the conditions.

    06:11 First, the randomization condition.

    06:13 The data are assumed to be from a random sample as stated in the problem so we're good there.

    06:18 Second, the 10% condition.

    06:21 25 is way less than 10% of the population of college students so we're good there.

    06:26 The nearly normal condition, The histogram indicated a unimodal distribution with a a slight skew to the left but it's not much of a concern because the sample size is 25 So it fits in that 15-40 range where we can deal with a little bit of skew.

    06:40 From the data, we can calculate the mean and the standard deviation, so we do that and we get y bar equals 6.64 s equals 1.0755 So we're gonna use this information to create a 95% confidence interval for the mean number of hours that college students got that night So here's the interval.

    07:01 We find that the standard error of y bar is 1.0755 divided by the square root of 25 or 0.2151 We know that from the example of using the t-table that if we want to find a 95% confidence interval, and we have 24 degrees of freedom, then our critical t-value is 2.060 Then our interval is y bar plus or minus our critical t times the standard error or 6.64 plus or minus 2.060 times 0.02151 which gives us an interval of 6.1969 up to 7.0831 hours So what does that tell us? what that tells is that we're 95% confident that the average college student sleeps between 6.1969 and 7.0831 hours per night Now let's look at one sample t-test for the mean We already know enough about hypothesis testing to construct it.

    07:59 So we just simply gonna outline the steps here.

    08:02 So the first step as it's always been - state the hypothesis.

    08:06 Null hypothesis: population means equal to some hypothesized value Alternative: population mean less than, population mean greater than or population mean not equal to that hypothesized value.

    08:21 Two, check conditions.

    08:23 The randomization condition, the 10% condition and the nearly normal condition These conditions are the same as they were for the confidence interval Third, find the test statistic.

    08:33 Use t equals y bar minus the hypothesized value of the population mean divided by the standard error of the sample mean.

    08:41 This has a t n minus 1 n distribution if the null hypothesis is true Four, use the table to find the critical t value for the test And five, state the decision and the conclusion So let's do an example We're gonna test the hypothesis that the average college student sleeps 7 hours per night against the alternative hypothesis that the average college student sleeps less than 7 hours per night.

    09:07 Let's let mu be the mean number of hours a college student sleeps each night, we're gonna test it at 5% signficance level.

    09:15 So here's the test, our hypothesis.

    09:17 Null hypothesis: mu equals 7 alternative hypothesis: mu less than 7.

    09:24 Conditions.

    09:25 We checked these conditions when we constructed the confidence interval and since they're the same conditions for the test as they are for the interval, we know that they are satisfied so we can carry on with the t-test procedure The test statistic, t equals 6.64 minus 7 divided by the standard error which is 0.2151 so we get a t-value of minus 1.6736 Now let's look at the critical regions for our t-test For a level alpha test, the rejection regions are dependent on the alternative hypothesis and they're as follows: If the alternative is mu1 less than mu0, then we will reject if our t-statistic is less than minus t and minus 1 alpha For the alternative, mu greater then mu 0, we reject the null hypothesis if our test statistic takes a value larger than t and minus 1 alpha And if our alternative hypothesis is mu not equal to mu0, then we're gonna reject the null hypothesis if the absolute value of our test statistic is larger than t and minus 1 alpha over 2 and we can find these values in the table as we did for the confidence interval.

    10:35 So these values are again found by matching up the degrees of freedom in the left hand margin, to the probablities in the top margin and then finding the value in the corresponding cell.

    10:43 For our test, we're going to reject H0 if t is less than minus t with 24 degrees of freedom and probability .05 which from the table we can find is minus 1.711 Our test statistic is minus 1.6736 so since this is not less than the critical t of minus 1.711, we don't reject the null hypothesis.

    11:08 We conclude that there is no evidence at the 5% level that the average college student sleeps fewer than 7 hours per night.

    11:16 P-values are really hard to find from the table.

    11:18 You can get ranges of p-values but we're not gonna do that, we're gonna rely on rejection regions here for t-test Sometimes, we want a particular margin of error for confidence interval so we need to find a sample size that's going to accomplish that given the confidence level.

    11:34 So let's recall the one-sample t-interval The margin of error is given by t* n minus 1 times the standard error or t*n minus 1 times s over the square root of n as we did before, if we want to attain a particular margin of error for a given level of confidence, we'll set the margin of error equal to the one that we want and then solve for the sample size, we end up with the following formula: the desired sample size is equal to t*n minus 1 times the sample standard deviation divided by the margin of error that whole thing get squared In our example, in order to attain the margin of error of .05, we need a sample size of n equals 2.060 times 1.0755 divided by .05, we square that whole thing and we need a sample size of 1963.43 In order to be conservative, just like we did before, we round up to 1964 so what we need is 1964 observations So what can go wrong with inference for means? Well let's be sure to avoid the following things: Don't confuse means and proportions.

    12:44 We wanna beware of data that have multiple modes and strong skew because those type of data clearly not from a normal distribution We wanna watch out for outliers and we wanna watch out for bias as well.

    12:58 We need to make sure that our observations are randomized otherwise, we can't use the t-test procedure.

    13:03 And finally, we wanna interpret our confidence intervals correctly.

    13:06 So in this lecture, what have we done? Well, we've switched gears from proportions to means and we learned about how to construct confidence intervals for means, how to interpret confidence intervals for means.

    13:17 We also learned how to carry out the one-sample t-test for a mean and that we discussed the things that can go wrong and the things things that we wanna avoid.

    13:26 So this is the end of lecture 6 and I look forward to seeing you again for lecture 7.


    About the Lecture

    The lecture Inference for Means by David Spade, PhD is from the course Statistics Part 2. It contains the following chapters:

    • Inference for Means
    • The t-Distribution
    • Three Conditions
    • Using Margin of Error
    • Pitfalls to Avoid

    Included Quiz Questions

    1. The distribution of the sample mean is more closely normal with larger sample sizes
    2. The distribution of the sample mean is more normal with smaller sample sizes
    3. In order for the Central Limit Theorem to apply, the population must be normal
    4. The standard deviation of the sample mean increases as the sample size increases
    1. The population standard deviation must be known
    2. The data must come from a normal population
    3. The population standard deviation is estimated with the sample standard deviation
    4. The test statistic is computed in the same way as the z-statistic from previous procedures, but the population standard deviation is estimated
    1. It has thinner tails than the normal distribution
    2. It is more peaked than the normal distribution
    3. It has thicker tails than the normal distribution
    4. As the degrees of freedom increase, the t-distribution looks more and more like the normal distribution
    1. The larger the sample size, the more unimodal and symmetric the histogram must look in order to use the t-interval
    2. The data must come from a random sample
    3. The data comes from a distribution that appears to be unimodal and symmetric, with no outliers or strong skew
    4. The sample size must be smaller than 10% of the population size
    1. It is poor practice to use the one-sample t-procedure with non-randomized data
    2. It is poor practice to watch out for outliers
    3. It is poor practice to beware of data with multiple modes and strong skew
    4. It is poor practice to watch out for biased data
    1. 8
    2. 5.
    3. 6
    4. 7
    5. 9
    1. H1: µ ≠ 5
    2. H1: µ < 5
    3. H1: µ > 5
    4. H1: µ ≤ 5
    5. H1: µ Δ 5
    1. 30
    2. 10
    3. 20
    4. 40
    5. 50

    Author of lecture Inference for Means

     David Spade, PhD

    David Spade, PhD


    Customer reviews

    (1)
    5,0 of 5 stars
    5 Stars
    5
    4 Stars
    0
    3 Stars
    0
    2 Stars
    0
    1  Star
    0