Inference for Paired Data

by David Spade, PhD

My Notes
  • Required.
Save Cancel
    Learning Material 2
    • PDF
      Slides Statistics pt2 Inference for Paired Data.pdf
    • PDF
      Download Lecture Overview
    Report mistake

    00:02 Welcome back for lecture 8 in which we'll discuss inference for paired data.

    00:06 So let's start with an example to motivate what we're gonna do here.

    00:10 The question is: Do flexible work schedules reduce the demand for resources? The Late County Illinois health department experimented with the flexible four-day work week.

    00:19 So for a year, the department recorded the mileage driven by 11 field workers on an ordinary five-day work week.

    00:26 Then it switched to a flexible four-day work week and recorded the mileage for another year.

    00:31 So here are the data for the 11 people that they looked at.

    00:34 So we have the five-day mileage and the four-day mileage for each person.

    00:38 Now we wanna perform inference on the differences in the mean mileage.

    00:42 So the question is: Can we use a two sample t-test? No, we can't. Why is that? Because each observation or each set of measurements was taken on the same person, so each person has two observations taken on them.

    00:56 This means that the two groups are independent.

    00:58 So we violate assumption 1 for the two sample procedures.

    01:02 So what do we do? We call these types of data, paired data or matched pairs.

    01:08 And one thing that we might think about doing, is we look at the differences in the four-day and the five-day mileage for each individual and then perform inference on the differences.

    01:18 Then what we have is we essentially have one observation for each individual, so we have one sample of independent observations.

    01:27 So what we do then, is once we have the differences, we analyze the differences in the same way that we would apply the one-sample t-procedures that we discussed before.

    01:37 As long as all the conditions for that are satisfied.

    01:40 So let's take the five-day minus the four-day mileage and re-frame our data in such a way that it just shows the differences.

    01:48 So what we get is the data you see right here.

    01:51 Each individual with a difference in the five-day and the four-day mileage.

    01:56 In order to carry out the paired t-procedures, we have to have some conditions satisfied.

    02:01 First of all, we need the paired data condition.

    02:04 Which simply says that our data come in matched pairs.

    02:07 Second, we need the independence assumption.

    02:10 So the differences have to be independent, so they have to come from a random sample.

    02:15 Third, the randomization condition.

    02:17 The data must come from a random sample or random assignment or groups.

    02:21 So three and two often take care of each other.

    02:25 Four, the 10% condition.

    02:27 The sample size has to be less than 10% of the population size.

    02:31 And five, the nearly normal condition.

    02:34 The differences have to show near normality in order to use the t-procedures for the paired data.

    02:41 So how do we carry out a paired t-test? Well, looks a lot like the one-sample t-test.

    02:47 Let's let mu D be the population mean difference.

    02:51 Then we hypothesize that mu D is equal to some hypothesized value versus one of the three standard alternatives.

    02:58 That mu D is less than mu 0, mu D greater than mu 0 or mu D is not equal to mu 0.

    03:05 Let's talk about the mechanics.

    03:07 We'll let S D be the sample standard deviation for the differences.

    03:11 Then the test statistic is given by d bar minus mu 0 divided by the standard error of the difference of d bar where d bar is the sample mean of the differences and the standard errror of d bar is given by SD over the square root of the sample size.

    03:28 Under the null hypothesis, the test statistic follow the t-distribution with n minus 1 degrees of freedom just like you did on the one-sample t-procedures.

    03:38 So let's do the example on the mileage data that we just looked at.

    03:42 Do we have paired data? Yes, we do.

    03:45 We have two observations on each individual so we can look at the differences.

    03:50 Do we have independence? The individuals are likely to be independent.

    03:55 The randomization condition is not stated explicitely on the problem but we're going to assume this.

    04:02 The 10% condition, the Lake County Health Department has more than 110 field workers, so we're good on the 10% condition.

    04:10 Now to the right you see a histogram of the differences.

    04:14 And so for the nearly normal condition, we have some problems with the normal assumption.

    04:19 We have two peaks at the right skew.

    04:21 So we have some problems with the nearly normal condition.

    04:24 but in order to get a feel for the mechanics of the task, we're going to do it anyway So let's look at the mechanics.

    04:30 Well first we need our summary statistics for the differences.

    04:32 We have the sample mean difference is 982 miles.

    04:36 The sample standard deviation is 1139.568 miles and our sample size is 11.

    04:44 So in this test what we're assuming initially is that there's no difference between the mileage for the four day and the five day work-week.

    04:50 so we're gonna assume that mu D is zero, that's our null hypothesis.

    04:55 Our test statistic then is d bar over SD divided by the square root of n or 982 divided by 1139.568 over the square root of 11 which gives us 2.858 The significance level is 5% and what we're looking to do is to see if there's a five-day mileage on average is greater than the four-day mileage.

    05:20 So we reject the null hypothesis if our test statistic takes the value of greater than or equal to t 10.05 which is 1.812.

    05:30 Our test statistic took a value of 2.858.

    05:34 So we reject the null hypothesis and conclude that there is evidence to suggest that average mileage decreases during the four-day work week versus the five-day work week.

    05:43 What if we want a confidence interval for the mean difference? Well if the conditions for the paired t-test are met, then we can form a 100 times 1 minus alpha percent confidence interval for the mean difference in the following way.

    05:55 We take d bar plus or minus t* with n minus 1 degrees of freedom times the standard error of the mean difference.

    06:03 This is the same form as we had in the one-sample t-interval.

    06:08 For the mileage example, we wanna construct the 95% confidence interval for the mean difference.

    06:13 So using the table, we find that t*10 is 2.228.

    06:19 We found during the hypothesis test that d bar was 982 and the standard error of d bar was 343.593.

    06:29 So when we form our confidence interval, we take 982 plus or minus 2.228 times 343.593.

    06:38 And what that gives us is an interval of 216.47 up to 1747.525 miles.

    06:47 So what that tells us is that we are 95% confident that the average mileage for the five-day work week is between 216.4748 and 1,747.525 miles higher than that for the four-day work week.

    07:04 With the paired t-test, there are a bunch of things that can go wrong.

    07:07 So here are some things that we want to avoid.

    07:09 We don't want to use a two-sample t-test when we have paired data because we know that our groups are not independent if we have paired data.

    07:17 We don't want to use a paired t-procedure when the data are not paired.

    07:21 So those first two things kinda go together.

    07:25 Don't forget to look out for outliers.

    07:27 This can indicate problems with the nearly normal assumption.

    07:31 And do not use side by side boxplots or histograms to look for the difference between the means of the paired groups because they're not from two different groups.

    07:39 So we're not doing this as we would for a two-sample t-test.

    07:44 So what have we done in this lecture? Well, we examine the difference between paired data and the type of data that enables us to use a two-sample t-test.

    07:53 We described how to carry out the paired t-test as well as how to construct a paired t confidence interval for paired data and for the average difference for paired data.

    08:04 We finished up by looking at some things that can go wrong and things that we wanna avoid when we use the paired t-procedures.

    08:11 This is the end of lecture 8 and I look forward seeing you back for lecture 9.

    About the Lecture

    The lecture Inference for Paired Data by David Spade, PhD is from the course Statistics Part 2. It contains the following chapters:

    • Inference for paired Data
    • The Paired t-Test
    • Pitfalls to Avoid

    Included Quiz Questions

    1. Paired data refers to situations in which two measurements are taken on the same individual and the differences in the measurements are observed
    2. Paired data refers to data in which the measurements are taken on different individuals in each group
    3. With paired data, the data in each group are independent
    4. There is no difference between paired data and the type of data used for the two-sample t-test
    1. The differences can have any distribution, and the paired t-procedures will still work well regardless of the sample size
    2. The data must be paired
    3. The differences must be independent
    1. There is no difference between the paired t-test and the one-sample t-test after the differences are calculated because the differences can be viewed as a random sample from a single population
    2. The standard error is calculated differently for the differences than it is for the individual observations in the one-sample t-test
    3. The degrees of freedom for the test statistic are computed differently for the two tests
    4. The test statistic is calculated differently for the two tests
    1. There is no difference in the two confidence interval procedures because the differences can be viewed as a random sample from one population
    2. The degrees of freedom are different between the two procedures
    3. The standard error calculation is different between the two procedures
    4. The critical value is found differently between the two procedures
    1. It is important to examine side-by-side box plots or histograms for differences when the data are paired
    2. It is important to be careful not to use a two-sample t-test with paired data
    3. It is important not to use the paired t-procedures when the data are not paired
    4. It is important to be cautions of outlying differences when working with paired data

    Author of lecture Inference for Paired Data

     David Spade, PhD

    David Spade, PhD

    Customer reviews

    5,0 of 5 stars
    5 Stars
    4 Stars
    3 Stars
    2 Stars
    1  Star