# Selection Bias

by Raywat Deonandan, PhD
(1)

Questions about the lecture
My Notes
• Required.
Learning Material 2
• PDF
Slides 04 CausationBiasIdeaSelection Epidemiology.pdf
• PDF
Download Lecture Overview
Report mistake
Transcript

00:01 bias is when an erroneous conclusion can arise from how we select our subjects. There are two kinds of selection biases really, in a broad sense. The first kind is when there's a systematic difference between individuals in one group of my study and individuals in another group of my study, in a way that I did not intend, and the other kind is when there is a systematic difference in those that have been selected for a study versus those who weren't selected. So consider this example. Let's say you're trying to measure the average height of all Americans and to do so we're going to take a sample of some Americans and extrapolate that data to the entire country and my sample is made up of 100 professional basketball players, do you see the problem? I think you do. Basketball players are typically much taller than most Americans and so the sample that I'm using is going to overestimate the average height of all Americans, that's a selection bias. Or maybe I'm interested in measuring the relationship between socioeconomic status or SES and health and I'm going to do so by sending out a flyer for people to come meet me in a church basement at 11 AM at which point I'm going to give them a questionnaire to ask about their health and their SES status.

01:09 Now what kind of people are going to show up in a church basement at 11 AM on a weekday, think about it, what do you think? Well I think the kinds of people that are going to show up are those people who typically don't have jobs, and if you haven't got a job, you're going to be probably of a lower socioeconomic status, so I'm selecting an enriched extreme population from which I'm going to derive a relationship between SES and health and I'm going to extrapolate that relationship to describe the general trend of SES and health, which may not be appropriate because my sample is specific and my sample is of those with low SES, that's a selection bias. Now consider another example, let's say I'm doing a study on antibiotic completion rates among different ethnicities and I'm doing the study in central Europe. Now I'm going to collect individuals from a central immobile location, let's say an office somewhere and I'll keep track of individuals, how much they're conforming to their antibiotic schedule. Now nomadic individuals, the Roma, are going to be lost to follow-up, they are going to move away, I won’t know what their antibiotic completion rate is going to be, so I will lose their data, we call that a kind of lost to follow-up bias.

02:29 Now let's look at a very famous example of selection bias. In 1981 there was a study in the New England Journal of Medicine and it showed an association between drinking coffee and getting pancreatic cancer, this was all over the news and it made people very upset and very scared, because we love our coffee and we don't like our pancreatic cancer.

02:46 The problem is, is that study was fraught with selection bias, it was a case-controlled study. If your remember what a case-control study is, it begins by ascertaining the outcome status, so we find people who have a disease, people who don't have a disease and we look backwards to see what their exposure was. So in this case-control study, the cases were people with pancreatic cancer, the controls were other people in the hospital, that's important, we typically choose our controls to be as similar to the cases as possible to avoid any extraneous variables getting in the way, but they're different in the sense that one group has the outcome we care about, pancreatic cancer, and the other one doesn't.

03:30 So in this study, they look backwards to see how many people in both groups had been exposed to coffee to an extreme extent. So do you see the bias here? Well commonly in case-control studies, the biases tend to arise from how you select the controls, and in this study that was definitely the case. The doctors chose their controls from other gastrointestinal patients in the same hospital wing, there are pancreatic cancer patients and other patients undergoing the same experience in the same wing, perhaps having the same doctors, but they had gastrointestinal disease. As a result, there is a bias in the sense that those with G.I. disorder were less likely to have drunk coffee recently. As a result, that reduced the exposure of coffee in that group and as a comparison that caused us to artificially give a sense that the pancreatic cancer group had a greater consumption of coffee than the G.I. group, that's not the case. Compared to the general population, they are exactly the same. So that created a spurious association between coffee and cancer that does not appear in the general population, so we conclude today that probably cancer is not associated with pancreatic cancer. So that study had a bias that caused us to make a very serious conclusion that was erroneous.

04:53 So there are different kinds of selection biases. One kind is non-response bias, that's when those who agree to participate are systematically different from those who don't agree to participate.

05:04 So have you ever wondered why people volunteer for studies? People who volunteer are typically different from people who don't volunteer. They are more extroverted. They are more eager.

05:12 Or maybe they have a vested interest in whatever the study is all. Typically people who work long hours who have very demanding jobs don't volunteer for studies either, so automatically you're seeing a systematic difference in the kinds of people we typically get in studies and the kinds of people we don't and that means our ability to generalize from a study population to the greater population, is a little bit hindered. Now we have this thing

### About the Lecture

The lecture Selection Bias by Raywat Deonandan, PhD is from the course Statistical Biases.

### Included Quiz Questions

1. Sampling scheme
2. Total population
3. Informed consent
4. Time-frame of study
5. Method used for data collection
1. Control subjects were chosen from patients’ with GI disorders
2. Control subjects were chosen from hospitalized patients
3. Control subjects did not include those with pancreatitis
4. Control subjects were less likely to enjoy the taste of coffee
5. Control subjects were more likely to leave the hospital prior to study completion
1. Change the time of the meeting to avoid conflicting with working hours
2. Distribute the flyer to more areas
3. Offer more enticing compensation for study participation
4. Change the study type from retrospective to prospective
5. Recruit people at the hospital in person rather than with a flyer
1. It is impossible to determine the presence of bias after the data has been collected
2. Bias cannot be measured using statistics because it comes from the research process itself
3. The error must be systematic and result in an incorrect association or estimate
4. Bias is entirely or mostly avoidable
5. It is easier to remove bias during design and implementation than during data analysis
1. Create errors that can be measured through statistical analysis
2. Mask an association between two variables that are really related
3. Cause us to underestimate the size of a real relationship
4. Cause us to overestimate the size of a real relationship
5. Create a spurious relationship between two variables
1. Those excluded due to informed consent
2. Those excluded based on inclusion criteria
3. All were discussed as potential sources of selection bias
4. Those who are lost to follow up
5. Those who chose not to participate in the study

### Customer reviews

(1)
5,0 of 5 stars
 5 Stars 5 4 Stars 0 3 Stars 0 2 Stars 0 1  Star 0