Dichotomous Variables – Statistics Basics

by Raywat Deonandan, PhD

My Notes
  • Required.
Save Cancel
    Learning Material 2
    • PDF
      Slides 03 Statistics Epidemiology.pdf
    • PDF
      Download Lecture Overview
    Report mistake

    00:00 they are discrete, they are categorical. Now when we have a categorical variable with two levels, we call that a dichotomous variable. For example, male versus female, there are two levels, the dichotomous variable or employed versus unemployed, or we have a disease or doesn't have a disease or guilty versus innocent in a courtroom. An emergency therapy outcome we can have someone who's alive or deceased, you're never something in between. Why is this important? It is important because a lot of the computations that we will be doing future lectures depend upon whether or not the exposure or outcome variables are dichotomous.

    00:38 We call those two by two contingency tables and we'll learn about that more in greater detail in future lectures. We can also take a continuous variable and turn it into a dichotomous variable, we call that the process of dichotomization. For example, let's say I have the ages of six individuals in a study, 23, 17, 14, 35, 68 and 15 and I can decide to categorize them into two groups, those who are under 18 and those who are 18 or over. Well, why would I want to do that? It may seem that I'm losing information by going from a continuous realm to a categorical realm. And it's true, I am losing information, it's typically not advisable to do that. For example, I can compute a mean age or a median age; I can't do that with my age groups anymore. If I have an individual who is 23, I know that that person is over 18, but if I know that someone's over 18, I don't know that he is 23, I've lost the ability to extract some nuance when I go to a dichotomous realm. So why would we want to do that? Again, it depends on the context; can you imagine a scenario in which it would be useful to dichotomize at age 18? You probably can, because 18 is an important age in a variety of places, it's the age you can drink sometimes or vote, so maybe I care about if my set of individuals in my study are of voting age or not, in which case I would cut them off at 18 and that has meaning. Again, context breeds information.

    02:11 It's possible to create categorical variables with more than just two levels from my continuous set. It doesn't have to be dichotomous. For example, that same set of six individuals I can create three categories for, those who are under 25, those who are 26-50 and those who are over 50. You may notice that from surveys you may have participated in, sometimes they ask for your age group, that's what they're doing; they're artificially creating a categorical variable out of a continuous variable. So it's important to think about where these numbers might come from and how to manipulate them in order to learn some wisdom about a larger set of individuals. This is what we call sampling. When we take a sample from a population, what we're doing is trying to get a representative set of individuals, upon which we can perform certain statistical tests that allow us to infer information about that population, inference is the key word here, because there are two kinds of statistics, there's a descriptive and inferential. With descriptive statistics I'm just describing the people that I have in front of me, six people with six different ages for example.

    03:22 With inferential statistics, I'm using that information to learn something about the larger population at hand. So where does this sample come from? It comes from a larger population, sometimes called a reference population. We extract a sample from that larger population, we manipulate that sample with statistics and we learn something about the larger population.

    03:44 It's important that that sample be representative, imagine if we selected a portion from that larger population that was atypical, that did not have the characteristics that one typically sees in that larger population, I might make faulty conclusions about that greater population because I chose poorly, my sample must be representative.

    04:07 So let's say we want to do a study in the USA, we're going to measure the prevalence rate of perceived back pain and we're going to do it via telephone survey, which is a very common way to conduct health sciences investigations with populations of this size.

    04:23 I'm going to have to take a sample of the American population, ask them about their back pain and make conclusions now about the overall American population. It's not feasible to ask the entire 300 million citizens of the USA about their back pain. I can’t afford it; I haven't got their phone numbers, so I have to use a sample. So I'm trying to generalize to all the adults in the USA, who can I access via telephone survey? Well, only those who have telephones, obviously. How can I access them? Well I'm going to buy a block of listed numbers from the phone company; this is typically how it is done, and who's in my study? Well pretty much those who answer the phone and agree to participate. Alright. So all of the adults in the USA, that’s the reference population. Who is the accessible population? Well those with telephones, and who is the sampling frame? Those individuals whose numbers I've purchased. Now from that frame I'm going to select a bunch to ask to participate and those who agree are my sample. Now think about this, the sample is where I do my statistics, when I’m making conclusions about their reference population. So we end up with a bunch of people with phones and listed numbers who agree to be interviewed, and we are going to conclude from their results some wisdom about the entire American population.

    05:51 Ask yourself, is this is a rational way to go. There is going to be some bias here, the kinds of people who typically have landline still are not typical of most people, the kinds of people who are home when you call them are not typical of most people, and the kinds of people who agree to participate in this kind of study probably aren't typical of most people. They are a particular kind of American phone owner, but yet their responses are going to allow us to generalize to the greater American population. That's a kind of bias and we’re going to talk about biases in more depth in a further lecture.

    About the Lecture

    The lecture Dichotomous Variables – Statistics Basics by Raywat Deonandan, PhD is from the course Statistics: Basics.

    Included Quiz Questions

    1. Conversion of a continuous variable into two groups
    2. Dividing a continuous variable into multiple groups
    3. Sampling a small group of the population to represent a larger sample of the population
    4. The categorization of variables into dependent and independent
    5. "All or none" thinking in dialectical behavioral therapy
    1. Results of a screening test
    2. Incidence of a disease
    3. Education level
    4. Household income
    5. Number of siblings
    1. All children in the USA
    2. Children with obesity in the USA
    3. The children whose data is included in the study
    4. Nutritional status of children in the USA
    5. Vaccination status of children in the USA
    1. List of patients at a medical office
    2. The respondents who fill out the survey
    3. All the people who have HIV in a population
    4. All the people who refused to be tested for HIV in the population
    5. All the people who have been tested for HIV in the population

    Author of lecture Dichotomous Variables – Statistics Basics

     Raywat Deonandan, PhD

    Raywat Deonandan, PhD

    Customer reviews

    5,0 of 5 stars
    5 Stars
    4 Stars
    3 Stars
    2 Stars
    1  Star