Table of Contents
Introduction to Statistical Bias
The ultimate aim of epidemiology is to investigate the cause of diseases and it is one of the important fields of medicine. The health of the human is subjected to the adversities caused by different viruses, bacteria and microorganisms, which basically are the pathogens disrupting the milieu of the body.
In addition to the microorganisms, there are many chronic diseases like diabetes mellitus, autoimmune diseases, which also cause significant mortality and morbidity. The aim of epidemiology and biostatistics is to determine the evidences for the causes, routes of dissemination, the frequency for particular diseases, the consequences of a disease and the different therapeutics available to target a disease.
Epidemiology studies the causes of disease (specifically pathogens responsible for diseases), the effects, cure for the diseases and the methods for improving people’s lives overall (targeting at a better health, which will lead to a peaceful society).
Epidemiology is a terminology based on a combination of three Greek words:
- “Epi” refers to “among/upon.”
- “Demos” relates to “study of population.”
- “Logos” relates to “scientific research (study).”
Epidemiology as a science study the influence of pathogens on individuals suffering from the disease caused by the pathogen and their drastic effects on the human body. In many of the diseases, it is very difficult to establish a causative link for the disease to occur. In these cases, the epidemiology plays a key role in determining the causative role of a pathogen pertaining to the particular disease.
In addition, it also determines the rate in which the disease spreads over a locality, the distribution pattern and the locations over where the disease spreads predominantly. All these in turn again help in determining the unexplored causal elements.
All the above-mentioned features of epidemiology and biostatistics is carried out by taking into consideration the pattern and details of the disease and then based on these observations reaching to a conclusion.
The sampling and experimentations are also taken into account. The experimentations, in turn, involve a diverse array of function like data collection, to analyze the collected data and based on the conclusion, hypothesizing the theories behind the disease.
Bias is an inclination of a person to hold or present a perspective (which in real life may be true or false) . It is someone’s natural tendency to act or feel in a particular way based on the inclination.
When we use bias in the field of statistics, it refers to an error. Statistical error is the error that one cannot correct by repeating the process again and again, but by taking the average of the results.
The aim of our experiment is to determine the area of a mall. The actual area of the mall is around 150 yards. Now let us repeat the experiment and consider the area of the mall. The aim is to see how much it varies from the actual area? If you have a random survey and ask people “what is the area of that mall?”, the answer remains “140 yards” from anyone answering the survey, which indicates that the answer will always have an error of 10 yards, and if we squared, the answer is always 100 (never considering the units squared errors).
Let assume now another example, where the Americans are certain about the area, with a mean of 90 yards, and standard deviation of 10 m. Now if you randomly poll two different people, and one answer is “90 yards”, and the other says “70 yards,” that means the first answer has an error of -10 yards and the second contains an error of -30 yards.
It is noticed that for first case perturbation leads the person towards correction. However, the second answer was furthermore from the actual answer. Thus, the perturbation can have influence on both the sides of the coin( both take our result towards the correction or take it further away).
Selection bias is a type of error which involves the selection of a specific population related to the trial or in the case of surveys and observation studies the population from which we get the view.
Selection bias is an unfair type of selection. It results when all participants are not equally interested in the experiment, i.e.,; some of the participants are willing, and others are forced to participate.
Selection bias results at the following stages:
- The stage of allocation of participants
- Keeping them engaged in the activity
- Along the course of the experiment.
Selection bias involves the following persons:
- Self-selection of participants
- Selection of samples to propose the hypothesis for the respective experiment.
Sources of selection bias
- Subversion of randomization due to lack of allocation concealment
- Attrition (reducing something’s strength or effectiveness through effective attack or pressure)
Types of selection bias
There are following types of selection bias:
- Sampling bias
- Time interval
- Observer selection
Sampling bias has results because of a non-random sample of participants, i.e. some of the participants are involved completely, while others are forced to get involved, which results in sampling bias.
There are methods like the random generation sequencer used to select the sample. It should also be remembered that the laws of the inferential statistics can be applied and holds true only when the randomisation was done properly. It is a subtype of selection bias, but some researchers consider it as a separate type of bias.
A major difference in selection bias and sampling bias is that selection bias involves internal validity, i.e. the similarities or the differences found in the sample, while sampling bias involves external validity, i.e. the ability for generalization of the result to the remaining population.
Examples of sampling bias:
- Pre-selection of trial participants
- Discounting trial subjects that didn’t undergo completion of the study
- Migration bias by omitting subjects who were recently added or removed from the study.
Early cessation of a trial at a time, when its results reinforce the desired conclusion. One more condition, when the trial may be terminated prematurely is when the adverse effect is of severe nature.
A trial may be terminated early at an uttermost value, but the resulting extreme value will have a large ‘variance’ even if ‘mean’ of all the variables is similar.
Clinical susceptibility bias: when the person gets affected with one disease and if that particular disease increases the vulnerability of the second disease then there occurs the clinical susceptibility bias (in those cases the treatment of the first disease erroneously can appear to make vulnerable to the second disease). For example, “postmenopausal syndrome” makes somebody vulnerable for “endometrial cancer”. Hence, estrogen taken for postmenopausal syndrome may result in a higher likelihood of endometrial cancer.
Protopathic bias: when a nursing for the first symptom of a disease or another outcome appears to cause the outcome. It can be alleviated by “lagging”, i.e. prohibition of exposure that occurred before diagnosis.
Indication bias: a potential mess between cause and effect when exposure is relying on indication, e.g. a treatment was given to a person with a high risk of getting a disease, potentially causing a predominance of treated people among those acquiring the disease. This may cause an erroneous appearance of a disease.
- Partitioning data and then analyzing tests designed for blindly chosen subdivisions.
- Preference (cherry picking)
- Post hoc: Amendment of the data inclusion based on temporary or subjective reasons including
Designation of which studies involved in meta-analysis. A proper meta-analysis should be containing all the available evidence on the subject of the interest. Omission of the key studies would falsely alter the results.
Repetition of experiments and reporting only the most favorable. This is commonly done.
Demonstrating only the most expressive results of data fishing.
Another method is the data dredging, where all the attempts are made to do multiple statistical analysis (this increases the error rate of the study). It may be done as a part of the study with the normal intention or may be carried out to determine which among the comparison is of the statistical significance. The statistical test is to be mentioned clearly in the initial protocol and is to be followed in order to avoid this.
- The process of gradually reducing the size of work or assignment through pressure.
- Attrition bias results are given by attrition. It includes dropout, non-response departure and protocol divergence.
Data are selected not only for study design and measurement, but also by important pre-requisite that there has to be someone doing some study about it.
Examples of selection bias:
- Bias due to the non-implementation or improper implementation of the allocation concealment
- RCT on thrombolysis with alternating day concealment
- Bias due to attrition
- RCT comparing medical versus surgical management of cerebrovascular disease
- Some disease circumstances can influence the circulation of blood to the brain (atherosclerosis can cause cerebrovascular disease).
A good researcher should have methods to overcome the shortcomings resulting from selection biasing. That is, one is not exposing some participants for the different duration and others for different rather same. The researcher will explain the possible errors that can result in his report or survey.
As explained above, it is very essential to follow the randomisation principles in the study. Ultimately, selection bias is unavoidable. So researcher should try to minimize its effects and quote the important errors.
Also known as observational bias/scrutinization bias, information bias relates to the bias which arises due to an error in the measurement process.
Information bias is a cognitive bias (systematic deviation from norms), which refers to distorted evaluation or data analyzed, which in turn relates to non-differential or differential misclassification.
Information bias includes:
- Classification error
- Differential and nondifferential bias
- Direction of bias
- Misclassification of co-variables.
The sources for the classification and the measurement error include:
- Instrumentation involved in the experiment
- Experimentation environment (laboratory)
- Questionnaire (in the case of survey)
Differential and non-differential bias
This is errors that are approximately same if we compare two or more groups
Differential misclassification is an error in which frequency is relatively higher in one of the groups being studied.
In real life, the non-differential misclassification of exposure is much more epidemic as compared to the differential.
Direction of bias:
- Towards the null
- Away from null
‘NULL” is “zero” for differences and “1” for ratios.
The direction of bias is usually unpredictable.
It is a terminology for the capability of a person to answer a questionnaire ambiguously.
It refers to the results from particular diagnostic phenomena or a particular type of instruments. Reporting bias is one of great concern, in the case of observational studies.
It is one of the most complicated phases in the analysis because the outcome may not be systemized.
It refers to the state of an affair in which association between subjection and outcome is misshapen due to the presence of another variable. Unless the effect of the confounding variable is nullified and analysed, the proper causation cannot be established.
Confounding can be of two types – positive and negative.
It is a type of confounding in which the researcher observes that association is biased away from the null.
It is a type of confounding in which the researcher observes that association is biased towards the null.
An irrelevant variable to that of the causation that partially or completely affects the causative analysis in determining the risk factor of the disease. Results becomes inaccurate due to the presence of a confounder.
For a variable to be declared as a confounder, it needs to satisfy the following conditions:
- Probability factor of the disease
- It is affiliated with hypothetical risk factor
- Not in causal mechanism between subjection and disease.
The last two conditions can be tested using appropriate statistical test on the data, whereas the first condition is more biological and conceptual. One of the example of the statistical test is the application of the hierarchical logistic regression for determining the confounding factor among a group of variables.
Potential confounders can be determined by our:
Causation is important in epidemiology. Though there is no exact quote definition for causation, the following five categories can be portrayed:
- Necessary and sufficient
- Counter of facts
Supremacy and weaknesses of all these categories are to be scrutinized by the readers, in order to get a hold on the proposed characteristics of causation (which in turn will be the right definition, too).
Two classes – era and counterfactual –, thoroughly come in the meaning of causation. The essential and adequate definition assess that every one of the causes are deterministic. The adequate part clarify that the variable is indispensable and by itself can cause the disease. Henceforth, as per both the perspectives, overwhelming smoking can be cited as a reason for lung disease, just when the nearness of obscure deterministic variable is suspected.
Some of the causative factors though might be essential, might not be adequate in causing a disease and there are additional factors required to cause the full blown disease.
Correlation tells us the relationship between the two variables. The relationship could be present or absent between two variables. When the relationship is present, it can be either positive or negative.
The correlation coefficient (r) represent the strength of the correlation. There are two methods of determining the correlation coefficient and it depends on the parametricity of the data in hand. If the data is parametric then generally pearson correlation coefficient is used and when the data is nonparametric then the spearman correlation coefficient is used.
The relationship between the two variables which were correlated could be a causal association or it may in any other context taken with regards to the experiment.
In case of positive correlation, dependent and independent variable fluctuates together, i.e. if one increases, the other variable also increases, and vice versa.
For a negative correlation dependent and independent variable fluctuates oppositely from one another, i.e. if one increases, the other decreases and vice versa.
The epidemiology and the biostatistics are deeply interlinked and depend on each other in their principles. The rules of both the disciplines are indispensable to any researcher and should be followed with full regard in their study. The bias which are mentioned in this document should be taken care by the researcher.