00:00
they are discrete, they are categorical. Now
when we have a categorical variable with two
levels, we call that a dichotomous variable.
For example, male versus female, there are
two levels, the dichotomous variable or employed
versus unemployed, or we have a disease or
doesn't have a disease or guilty versus innocent
in a courtroom. An emergency therapy outcome
we can have someone who's alive or deceased,
you're never something in between. Why is
this important? It is important because a
lot of the computations that we will be doing
future lectures depend upon whether or not
the exposure or outcome variables are dichotomous.
00:38
We call those two by two contingency tables
and we'll learn about that more in greater
detail in future lectures. We can also take
a continuous variable and turn it into a dichotomous
variable, we call that the process of dichotomization.
For example, let's say I have the ages of
six individuals in a study, 23, 17, 14, 35,
68 and 15 and I can decide to categorize them
into two groups, those who are under 18 and
those who are 18 or over. Well, why would
I want to do that? It may seem that I'm losing
information by going from a continuous realm
to a categorical realm. And it's true, I am
losing information, it's typically not advisable
to do that. For example, I can compute a mean
age or a median age; I can't do that with
my age groups anymore. If I have an individual
who is 23, I know that that person is over
18, but if I know that someone's over 18,
I don't know that he is 23, I've lost the
ability to extract some nuance when I go to
a dichotomous realm. So why would we want to
do that? Again, it depends on the context;
can you imagine a scenario in which it would
be useful to dichotomize at age 18? You probably
can, because 18 is an important age in a variety
of places, it's the age you can drink sometimes
or vote, so maybe I care about if my set of
individuals in my study are of voting age
or not, in which case I would cut them off
at 18 and that has meaning. Again, context
breeds information.
02:11
It's possible to create categorical variables
with more than just two levels from my continuous
set. It doesn't have to be dichotomous. For
example, that same set of six individuals
I can create three categories for, those who
are under 25, those who are 26-50 and those
who are over 50. You may notice that from
surveys you may have participated in, sometimes
they ask for your age group, that's what they're
doing; they're artificially creating a categorical
variable out of a continuous variable. So
it's important to think about where these
numbers might come from and how to manipulate
them in order to learn some wisdom about a
larger set of individuals. This is what we
call sampling. When we take a sample from
a population, what we're doing is trying to
get a representative set of individuals, upon
which we can perform certain statistical tests
that allow us to infer information about that
population, inference is the key word here,
because there are two kinds of statistics,
there's a descriptive and inferential. With
descriptive statistics I'm just describing
the people that I have in front of me, six
people with six different ages for example.
03:22
With inferential statistics, I'm using that
information to learn something about the larger
population at hand. So where does this sample
come from? It comes from a larger population,
sometimes called a reference population. We
extract a sample from that larger population,
we manipulate that sample with statistics
and we learn something about the larger population.
03:44
It's important that that sample be representative,
imagine if we selected a portion from that
larger population that was atypical, that
did not have the characteristics that one
typically sees in that larger population,
I might make faulty conclusions about that
greater population because I chose poorly,
my sample must be representative.
04:07
So let's say we want to do a study in the
USA, we're going to measure the prevalence
rate of perceived back pain and we're going
to do it via telephone survey, which is a
very common way to conduct health sciences
investigations with populations of this size.
04:23
I'm going to have to take a sample of the
American population, ask them about their
back pain and make conclusions now about the
overall American population. It's not feasible
to ask the entire 300 million citizens of
the USA about their back pain. I can’t afford
it; I haven't got their phone numbers, so
I have to use a sample. So I'm trying to generalize
to all the adults in the USA, who can I access
via telephone survey? Well, only those who
have telephones, obviously. How can I access
them? Well I'm going to buy a block of listed
numbers from the phone company; this is typically
how it is done, and who's in my study? Well
pretty much those who answer the phone and
agree to participate. Alright. So all of the
adults in the USA, that’s the reference
population. Who is the accessible population?
Well those with telephones, and who is the
sampling frame? Those individuals whose numbers
I've purchased. Now from that frame I'm going
to select a bunch to ask to participate and
those who agree are my sample. Now think about
this, the sample is where I do my statistics,
when I’m making conclusions about their
reference population. So we end up with a
bunch of people with phones and listed numbers
who agree to be interviewed, and we are going
to conclude from their results some wisdom
about the entire American population.
05:51
Ask yourself, is this is a rational way to go.
There is going to be some bias here, the kinds
of people who typically have landline still
are not typical of most people, the kinds
of people who are home when you call them
are not typical of most people, and the kinds
of people who agree to participate in this
kind of study probably aren't typical of most
people. They are a particular kind of American
phone owner, but yet their responses are going
to allow us to generalize to the greater American
population. That's a kind of bias and we’re
going to talk about biases in more depth in
a further lecture.