00:02 Welcome back for lecture 8 in which we'll discuss inference for paired data. 00:06 So let's start with an example to motivate what we're gonna do here. 00:10 The question is: Do flexible work schedules reduce the demand for resources? The Late County Illinois health department experimented with the flexible four-day work week. 00:19 So for a year, the department recorded the mileage driven by 11 field workers on an ordinary five-day work week. 00:26 Then it switched to a flexible four-day work week and recorded the mileage for another year. 00:31 So here are the data for the 11 people that they looked at. 00:34 So we have the five-day mileage and the four-day mileage for each person. 00:38 Now we wanna perform inference on the differences in the mean mileage. 00:42 So the question is: Can we use a two sample t-test? No, we can't. Why is that? Because each observation or each set of measurements was taken on the same person, so each person has two observations taken on them. 00:56 This means that the two groups are independent. 00:58 So we violate assumption 1 for the two sample procedures. 01:02 So what do we do? We call these types of data, paired data or matched pairs. 01:08 And one thing that we might think about doing, is we look at the differences in the four-day and the five-day mileage for each individual and then perform inference on the differences. 01:18 Then what we have is we essentially have one observation for each individual, so we have one sample of independent observations. 01:27 So what we do then, is once we have the differences, we analyze the differences in the same way that we would apply the one-sample t-procedures that we discussed before. 01:37 As long as all the conditions for that are satisfied. 01:40 So let's take the five-day minus the four-day mileage and re-frame our data in such a way that it just shows the differences. 01:48 So what we get is the data you see right here. 01:51 Each individual with a difference in the five-day and the four-day mileage. 01:56 In order to carry out the paired t-procedures, we have to have some conditions satisfied. 02:01 First of all, we need the paired data condition. 02:04 Which simply says that our data come in matched pairs. 02:07 Second, we need the independence assumption. 02:10 So the differences have to be independent, so they have to come from a random sample. 02:15 Third, the randomization condition. 02:17 The data must come from a random sample or random assignment or groups. 02:21 So three and two often take care of each other. 02:25 Four, the 10% condition. 02:27 The sample size has to be less than 10% of the population size. 02:31 And five, the nearly normal condition. 02:34 The differences have to show near normality in order to use the t-procedures for the paired data. 02:41 So how do we carry out a paired t-test? Well, looks a lot like the one-sample t-test. 02:47 Let's let mu D be the population mean difference. 02:51 Then we hypothesize that mu D is equal to some hypothesized value versus one of the three standard alternatives. 02:58 That mu D is less than mu 0, mu D greater than mu 0 or mu D is not equal to mu 0. 03:05 Let's talk about the mechanics. 03:07 We'll let S D be the sample standard deviation for the differences. 03:11 Then the test statistic is given by d bar minus mu 0 divided by the standard error of the difference of d bar where d bar is the sample mean of the differences and the standard errror of d bar is given by SD over the square root of the sample size. 03:28 Under the null hypothesis, the test statistic follow the t-distribution with n minus 1 degrees of freedom just like you did on the one-sample t-procedures. 03:38 So let's do the example on the mileage data that we just looked at. 03:42 Do we have paired data? Yes, we do. 03:45 We have two observations on each individual so we can look at the differences. 03:50 Do we have independence? The individuals are likely to be independent. 03:55 The randomization condition is not stated explicitely on the problem but we're going to assume this. 04:02 The 10% condition, the Lake County Health Department has more than 110 field workers, so we're good on the 10% condition. 04:10 Now to the right you see a histogram of the differences. 04:14 And so for the nearly normal condition, we have some problems with the normal assumption. 04:19 We have two peaks at the right skew. 04:21 So we have some problems with the nearly normal condition. 04:24 but in order to get a feel for the mechanics of the task, we're going to do it anyway So let's look at the mechanics. 04:30 Well first we need our summary statistics for the differences. 04:32 We have the sample mean difference is 982 miles. 04:36 The sample standard deviation is 1139.568 miles and our sample size is 11. 04:44 So in this test what we're assuming initially is that there's no difference between the mileage for the four day and the five day work-week. 04:50 so we're gonna assume that mu D is zero, that's our null hypothesis. 04:55 Our test statistic then is d bar over SD divided by the square root of n or 982 divided by 1139.568 over the square root of 11 which gives us 2.858 The significance level is 5% and what we're looking to do is to see if there's a five-day mileage on average is greater than the four-day mileage. 05:20 So we reject the null hypothesis if our test statistic takes the value of greater than or equal to t 10.05 which is 1.812. 05:30 Our test statistic took a value of 2.858. 05:34 So we reject the null hypothesis and conclude that there is evidence to suggest that average mileage decreases during the four-day work week versus the five-day work week. 05:43 What if we want a confidence interval for the mean difference? Well if the conditions for the paired t-test are met, then we can form a 100 times 1 minus alpha percent confidence interval for the mean difference in the following way. 05:55 We take d bar plus or minus t* with n minus 1 degrees of freedom times the standard error of the mean difference. 06:03 This is the same form as we had in the one-sample t-interval. 06:08 For the mileage example, we wanna construct the 95% confidence interval for the mean difference. 06:13 So using the table, we find that t*10 is 2.228. 06:19 We found during the hypothesis test that d bar was 982 and the standard error of d bar was 343.593. 06:29 So when we form our confidence interval, we take 982 plus or minus 2.228 times 343.593. 06:38 And what that gives us is an interval of 216.47 up to 1747.525 miles. 06:47 So what that tells us is that we are 95% confident that the average mileage for the five-day work week is between 216.4748 and 1,747.525 miles higher than that for the four-day work week. 07:04 With the paired t-test, there are a bunch of things that can go wrong. 07:07 So here are some things that we want to avoid. 07:09 We don't want to use a two-sample t-test when we have paired data because we know that our groups are not independent if we have paired data. 07:17 We don't want to use a paired t-procedure when the data are not paired. 07:21 So those first two things kinda go together. 07:25 Don't forget to look out for outliers. 07:27 This can indicate problems with the nearly normal assumption. 07:31 And do not use side by side boxplots or histograms to look for the difference between the means of the paired groups because they're not from two different groups. 07:39 So we're not doing this as we would for a two-sample t-test. 07:44 So what have we done in this lecture? Well, we examine the difference between paired data and the type of data that enables us to use a two-sample t-test. 07:53 We described how to carry out the paired t-test as well as how to construct a paired t confidence interval for paired data and for the average difference for paired data. 08:04 We finished up by looking at some things that can go wrong and things that we wanna avoid when we use the paired t-procedures. 08:11 This is the end of lecture 8 and I look forward seeing you back for lecture 9.
The lecture Inference for Paired Data by David Spade, PhD is from the course Statistics Part 2. It contains the following chapters:
What is meant by the term “paired data"?
What is not a condition necessary for a paired t-procedure?
Once the differences are computed, what is the difference between the one-sample t-test and the paired t-test?
What is an example of a paired-t-test?
What is not something of which to be cautious for the paired t-procedures?
5 Stars |
|
5 |
4 Stars |
|
0 |
3 Stars |
|
0 |
2 Stars |
|
0 |
1 Star |
|
0 |