# Statistical Tests and Data Representation

One of the main objectives of research Research Critical and exhaustive investigation or experimentation, having for its aim the discovery of new facts and their correct interpretation, the revision of accepted conclusions, theories, or laws in the light of newly discovered facts, or the practical application of such new or revised conclusions, theories, or laws. Conflict of Interest and medical studies is to learn what associations or outcomes are not a product Product A molecule created by the enzymatic reaction. Basics of Enzymes of chance. According to the study's design and the data it provides, a hypothesis can be accepted or rejected, allowing for a determination in correlation Correlation Determination of whether or not two variables are correlated. This means to study whether an increase or decrease in one variable corresponds to an increase or decrease in the other variable. Causality, Validity, and Reliability. Statistical tests are tools used by researchers to obtain information and meaning from pools of variable Variable Variables represent information about something that can change. The design of the measurement scales, or of the methods for obtaining information, will determine the data gathered and the characteristics of that data. As a result, a variable can be qualitative or quantitative, and may be further classified into subgroups. Types of Variables data. These tests come in several forms, including, for example, the chi-square and Fisher exact tests, and are chosen depending on the needs of the investigators and the characteristics of the variables being analyzed. Study results can be considered statistically significant based on calculated p-values and predetermined levels of significance (known as the α-level). Confidence intervals are another way to express the significance of a statistical result without using a p-value.

Last updated: Aug 9, 2022

Editorial responsibility: Stanley Oiseth, Lindsay Jones, Evelin Maza

## Introduction

Hypothesis testing is used to assess the plausibility of a hypothesis by analyzing study data.

For example, a company creates a new Drug X that is intended to treat hypertension Hypertension Hypertension, or high blood pressure, is a common disease that manifests as elevated systemic arterial pressures. Hypertension is most often asymptomatic and is found incidentally as part of a routine physical examination or during triage for an unrelated medical encounter. Hypertension. The company wants to know whether Drug X does in fact work to lower BP, so they need to do hypothesis testing.

Steps for testing a hypothesis:

1. Formulate the hypothesis.
2. Choose which statistical test you are going to use.
3. Set the significance level.
4. Calculate the test statistics from your data using the appropriate/chosen test.
5. Conclusions:
• A decision is made to reject or not reject the null hypothesis from step 1.
• This decision is based on the predetermined levels of significance from step 3.

## Formulating a Hypothesis

A hypothesis is a preliminary answer to a research Research Critical and exhaustive investigation or experimentation, having for its aim the discovery of new facts and their correct interpretation, the revision of accepted conclusions, theories, or laws in the light of newly discovered facts, or the practical application of such new or revised conclusions, theories, or laws. Conflict of Interest question (i.e., a “guess” about what the results will be). There are 2 types of hypotheses: the null hypothesis and the alternative hypothesis.

### Null hypothesis

• The null hypothesis (H0) states that there is no difference between the populations being studied (or put another way, there is no relationship Relationship A connection, association, or involvement between 2 or more parties. Clinician–Patient Relationship between the variables being tested).
• Written as a formula, H0: µ1 = µ2, where µ represents the means (or average measurements) of groups 1 and 2, respectively
• Example: Drug X was created to lower BP. An experiment is designed to test whether Drug X actually lowers BP. Drug X is given to 1 group, while a 2nd group gets a placebo Placebo Any dummy medication or treatment. Although placebos originally were medicinal preparations having no specific pharmacological activity against a targeted condition, the concept has been extended to include treatments or procedures, especially those administered to control groups in clinical trials in order to provide baseline measurements for the experimental protocol. Epidemiological Studies. The null hypothesis would state that Drug X has no effect on BP and that both groups will have the same average BP at the end of the study period.

### Alternative hypothesis

• The alternative hypothesis (H1) states that there is a difference between the populations being studied.
• Written as a formula, H1: µ1 ≠ µ2
• Example: In the experiment described above, the alternative hypothesis is that Drug X lowers BP, and that patients Patients Individuals participating in the health care system for the purpose of receiving therapeutic, diagnostic, or preventive procedures. Clinician–Patient Relationship in the study group getting Drug X will have lower BP than patients Patients Individuals participating in the health care system for the purpose of receiving therapeutic, diagnostic, or preventive procedures. Clinician–Patient Relationship in the placebo Placebo Any dummy medication or treatment. Although placebos originally were medicinal preparations having no specific pharmacological activity against a targeted condition, the concept has been extended to include treatments or procedures, especially those administered to control groups in clinical trials in order to provide baseline measurements for the experimental protocol. Epidemiological Studies group at the end of the study period.
• H1 is a statement that researchers think is true.

### What is the study really testing?

• Hypothesis testing on samples can never verify a hypothesis with certainty and can only say that a hypothesis has a certain probability Probability Probability is a mathematical tool used to study randomness and provide predictions about the likelihood of something happening. There are several basic rules of probability that can be used to help determine the probability of multiple events happening together, separately, or sequentially. Basics of Probability to be true or false.
• research Research Critical and exhaustive investigation or experimentation, having for its aim the discovery of new facts and their correct interpretation, the revision of accepted conclusions, theories, or laws in the light of newly discovered facts, or the practical application of such new or revised conclusions, theories, or laws. Conflict of Interest study involving hypotheses will either reject or fail to reject the null hypothesis.

### Examples

Example 1: rejecting the null hypothesis

In the example above, if the findings of the trial show that Drug X does in fact significantly lower BP (that is, there is sufficient statistical evidence to support it), then the null hypothesis (postulating that there is no difference between the groups) is rejected with a given probability Probability Probability is a mathematical tool used to study randomness and provide predictions about the likelihood of something happening. There are several basic rules of probability that can be used to help determine the probability of multiple events happening together, separately, or sequentially. Basics of Probability. Note that these findings cannot confirm the alternative hypothesis, but only support it with a given probability Probability Probability is a mathematical tool used to study randomness and provide predictions about the likelihood of something happening. There are several basic rules of probability that can be used to help determine the probability of multiple events happening together, separately, or sequentially. Basics of Probability, determined by the sampling distribution in the population tested.

Example 2: failing to reject the null hypothesis

In the example above, if the findings of the trial show that Drug X did not significantly lower BP, then the study failed to reject the null hypothesis. Again, note that the findings cannot confirm the null hypothesis but only support it with a given probability Probability Probability is a mathematical tool used to study randomness and provide predictions about the likelihood of something happening. There are several basic rules of probability that can be used to help determine the probability of multiple events happening together, separately, or sequentially. Basics of Probability, determined by the sampling distribution in the population tested.

### Types of errors and power

• Type I error Error Refers to any act of commission (doing something wrong) or omission (failing to do something right) that exposes patients to potentially hazardous situations. Disclosure of Information:
• The null hypothesis is true, but is rejected.
• The chance of committing a type I error Error Refers to any act of commission (doing something wrong) or omission (failing to do something right) that exposes patients to potentially hazardous situations. Disclosure of Information is represented as α.
• Type II error Error Refers to any act of commission (doing something wrong) or omission (failing to do something right) that exposes patients to potentially hazardous situations. Disclosure of Information:
• The null hypothesis is false, but is accepted/not rejected.
• The chance of committing a type II error Error Refers to any act of commission (doing something wrong) or omission (failing to do something right) that exposes patients to potentially hazardous situations. Disclosure of Information is represented as β.
• Power:
• The probability Probability Probability is a mathematical tool used to study randomness and provide predictions about the likelihood of something happening. There are several basic rules of probability that can be used to help determine the probability of multiple events happening together, separately, or sequentially. Basics of Probability that a test will correctly reject a false null hypothesis
• Power = 1 – β
• Power depends on:
• Sample size Sample size The number of units (persons, animals, patients, specified circumstances, etc.) in a population to be studied. The sample size should be big enough to have a high likelihood of detecting a true difference between two groups. Statistical Power (e.g., higher sample size Sample size The number of units (persons, animals, patients, specified circumstances, etc.) in a population to be studied. The sample size should be big enough to have a high likelihood of detecting a true difference between two groups. Statistical Power → ↑ power)
• Size of expected effect (e.g., higher/larger expected effect → ↑ power)

## Determining Statistical Significance

Statistical significance is the idea that all test outcomes are highly unlikely to be produced simply by chance. To determine statistical significance, you need to set an α-value and calculate a p-value.

### P-values

A graph can be created in which possible study results are plotted on the x-axis and the probability Probability Probability is a mathematical tool used to study randomness and provide predictions about the likelihood of something happening. There are several basic rules of probability that can be used to help determine the probability of multiple events happening together, separately, or sequentially. Basics of Probability of observing each result are plotted on the y-axis. The area under the curve represents the p-value.

• The p-value is the probability Probability Probability is a mathematical tool used to study randomness and provide predictions about the likelihood of something happening. There are several basic rules of probability that can be used to help determine the probability of multiple events happening together, separately, or sequentially. Basics of Probability of obtaining a given result, assuming the null hypothesis is true.
• In other words, the p-value is the probability Probability Probability is a mathematical tool used to study randomness and provide predictions about the likelihood of something happening. There are several basic rules of probability that can be used to help determine the probability of multiple events happening together, separately, or sequentially. Basics of Probability that you would get this result if there was no relationship Relationship A connection, association, or involvement between 2 or more parties. Clinician–Patient Relationship between the variables and that the results occurred simply by chance.
• Like all probabilities, the p-value is between 0 and 1.
• Higher p-values (larger areas under the curve):
• Indicate a higher likelihood that the null hypothesis is true
• Suggests that there is no relationship Relationship A connection, association, or involvement between 2 or more parties. Clinician–Patient Relationship between your variables
• Example: In the example above, a p-value of 0.6 would mean Mean Mean is the sum of all measurements in a data set divided by the number of measurements in that data set. Measures of Central Tendency and Dispersion it is unlikely that Drug X is associated with lower BP.
• Lower p-values (smaller areas under the curve):
• Indicate a low likelihood that the null hypothesis is true
• Suggests that an observed correlation Correlation Determination of whether or not two variables are correlated. This means to study whether an increase or decrease in one variable corresponds to an increase or decrease in the other variable. Causality, Validity, and Reliability between your variables is unlikely to be due simply to chance and that a true relationship Relationship A connection, association, or involvement between 2 or more parties. Clinician–Patient Relationship likely exists
• Example: In the example above, a p-value of 0.02 suggests that Drug X is associated with lower BP.
• If the p-value is lower than your predetermined level of significance (α-level), you can reject the null hypothesis, because there likely is a real relationship Relationship A connection, association, or involvement between 2 or more parties. Clinician–Patient Relationship between your variables.
• The lower the p-value, the more confident you can be that the relationship Relationship A connection, association, or involvement between 2 or more parties. Clinician–Patient Relationship between your variables is true (and are not due to chance).

Mnemonic:

“If the p is low, the null (hypothesis) must go.”

### α-level

• The α-level is a p-value that represents an arbitrarily determined “significance level.”
• The α-level should be chosen prior to conducting a study.
• By convention, the α-level is typically set at 0.05 or 0.01.
• The α-level is the risk you are willing to take of making a wrong decision, in which you incorrectly reject the null hypothesis (when it is in fact true).
• Example:
• An α-level of 0.05 means you will conclude that a relationship Relationship A connection, association, or involvement between 2 or more parties. Clinician–Patient Relationship between your variables exists if the p-value is < 0.05.
• This means you are willing to accept up to a 5% chance of committing a type 1 Type 1 Spinal Muscular Atrophy error Error Refers to any act of commission (doing something wrong) or omission (failing to do something right) that exposes patients to potentially hazardous situations. Disclosure of Information.
• In the Drug X BP example, if the p-value was 0.03, then you would conclude that:
• Drug X is associated with lower BP → this is a rejection of the null hypothesis
• There is a 3% chance you have committed a type 1 Type 1 Spinal Muscular Atrophy error Error Refers to any act of commission (doing something wrong) or omission (failing to do something right) that exposes patients to potentially hazardous situations. Disclosure of Information: that the null hypothesis was in fact true and Drug X is not actually associated with lower BP.

### Confidence intervals

• A CI CI The percentage of the chest diameter occupied by the heart. Imaging of the Heart and Great Vessels is the probability Probability Probability is a mathematical tool used to study randomness and provide predictions about the likelihood of something happening. There are several basic rules of probability that can be used to help determine the probability of multiple events happening together, separately, or sequentially. Basics of Probability that your result falls between a defined range of values.
• CIs measure the degree of uncertainty in sampling.
• The CI CI The percentage of the chest diameter occupied by the heart. Imaging of the Heart and Great Vessels is the range of means you would get from repeatedly sampling the same population over and over.
• CIs are calculated using the sample size Sample size The number of units (persons, animals, patients, specified circumstances, etc.) in a population to be studied. The sample size should be big enough to have a high likelihood of detecting a true difference between two groups. Statistical Power, the sample’s mean Mean Mean is the sum of all measurements in a data set divided by the number of measurements in that data set. Measures of Central Tendency and Dispersion, and the standard deviation Standard deviation The standard deviation (SD) is a measure of how far each observed value is from the mean in a data set. Measures of Central Tendency and Dispersion (online calculators and standard tables are typically used).
• The confidence level for CIs is the probability Probability Probability is a mathematical tool used to study randomness and provide predictions about the likelihood of something happening. There are several basic rules of probability that can be used to help determine the probability of multiple events happening together, separately, or sequentially. Basics of Probability that the CI CI The percentage of the chest diameter occupied by the heart. Imaging of the Heart and Great Vessels contains the true result
• Most commonly, a 95% confidence level is used (though the confidence level often ranges from 90% to 99%)
• A 95% CI CI The percentage of the chest diameter occupied by the heart. Imaging of the Heart and Great Vessels is a range of values that are 95% certain to contain the true mean Mean Mean is the sum of all measurements in a data set divided by the number of measurements in that data set. Measures of Central Tendency and Dispersion of the population.
• Like the α-level, the CI CI The percentage of the chest diameter occupied by the heart. Imaging of the Heart and Great Vessels confidence level is chosen prior to testing the data.
• The higher the confidence needed, the larger the interval will be.
• Example: Researchers want to determine the average height in a population of 1000 men. Heights are measured in a random sample of 50 of these men.

### Pitfalls Pitfalls Basics of Probability in hypothesis testing

• Do not base your hypothesis on what you see in the data.
• Do not make your H0 what you want to show to be true.
• Check the conditions.
• Do not accept the H0, instead fail to reject it.
• Do not confuse practical significance and statistical significance (e.g., with a large enough sample size Sample size The number of units (persons, animals, patients, specified circumstances, etc.) in a population to be studied. The sample size should be big enough to have a high likelihood of detecting a true difference between two groups. Statistical Power, you may find that Drug X lowers systolic BP by 2 mm MM Multiple myeloma (MM) is a malignant condition of plasma cells (activated B lymphocytes) primarily seen in the elderly. Monoclonal proliferation of plasma cells results in cytokine-driven osteoclastic activity and excessive secretion of IgG antibodies. Multiple Myeloma Hg. Even if this is statistically significant, is this clinically significant for your patient?)
• If you fail to reject the H0, do not assume that a larger sample size Sample size The number of units (persons, animals, patients, specified circumstances, etc.) in a population to be studied. The sample size should be big enough to have a high likelihood of detecting a true difference between two groups. Statistical Power will lead to rejection.
• Be sure to think about whether it is reasonable to assume that events are independent.
• Do not interpret p-values as the probability Probability Probability is a mathematical tool used to study randomness and provide predictions about the likelihood of something happening. There are several basic rules of probability that can be used to help determine the probability of multiple events happening together, separately, or sequentially. Basics of Probability that the H0 is true.
• Even a test carried out perfectly can be wrong.

## Statistical Tests

### Choosing the right test

Your choice of test is based on:

• The types of variables you are testing (both your test “exposure” and your “outcome”)
• Quantitative: continuous (age, weight, height) versus discrete (number of patients Patients Individuals participating in the health care system for the purpose of receiving therapeutic, diagnostic, or preventive procedures. Clinician–Patient Relationship)
• Categorical: ordinal (rankings; e.g., grades, clothing size), nominal (groups with names; e.g., marital status), or binary (data with only a “yes/no” answer; e.g., alive or dead)
• Whether or not your data meet certain criteria known as assumptions; common assumptions include:
• Data points are all independent of one another.
• Variance within a single group is similar among all groups.
• Data follow a normal distribution (bell curve).

The reasonability of the model should always be questioned. If the model is wrong, so is everything else.

Be careful of variables that are not truly independent.

### Types of tests

The 3 primary categories of statistical tests are:

1. Regression Regression Corneal Abrasions, Erosion, and Ulcers tests: assess cause-and-effect relationships
2. Comparison tests: compare the means of different groups (require quantitative outcome data)
3. Correlation Correlation Determination of whether or not two variables are correlated. This means to study whether an increase or decrease in one variable corresponds to an increase or decrease in the other variable. Causality, Validity, and Reliability tests: look for associations between different variables

### Chi-square test (χ2)

Chi-square tests are commonly used to analyze categorical data and determine whether 2 categorical variables are related.

• What chi-square tests can assess:
• Whether or not a statistically significant association is present between 2 variables
• Analyzed data: typically “counted” categorical data, meaning you have a number of named categories, and your data points are the counted values for each category.
• More accurate on large samples than Fisher’s exact test
• What chi-square tests cannot assess:

In order to perform a chi-square test, 2 pieces of information are needed: the degrees of freedom (number of categories minus 1), and the α-level (which is chosen by the researcher and usually set at 0.05). In addition, the data should be organized in a table.

Example: If you wanted to see whether jugglers were more likely to be born during a particular season, the data could be recorded in the following table:

To begin, the expected frequencies for each cell in the table above need to be determined using the equation:

$$Expected\ frequency = np_{0i}$$

where n = the sample size Sample size The number of units (persons, animals, patients, specified circumstances, etc.) in a population to be studied. The sample size should be big enough to have a high likelihood of detecting a true difference between two groups. Statistical Power and p0i is the hypothesized proportion in each category i

In the above example, n = 300 and p0i is ¼, so the expected cell frequency is 300 * 0.25 = 75 in each cell.

The test statistic is then calculated by the standard chi-square formula:

$$\chi ^{2} = \sum _{all\ cells} \frac{(observed-expected)^{2}}{expected}$$

where 𝝌2 is the test statistic being calculated. For each “cell” or category, the expected frequency is subtracted from the observed frequency; this value is squared and then divided by the expected frequency. After this number is calculated for each category, the numbers are added together.

Example 𝝌2 calculation: Using the example above, the expected frequency in each cell is 75, so the 𝝌2 test statistic can be calculated as follows:

𝝌2 = 1.08 + 0.653 + 0.013 + 0.12 = 1.866

Determining whether or not the test statistic is statistically significant:

To determine whether this test statistic is statistically significant, the chi-square table is used to obtain the chi-square critical number.

• The table has degrees of freedom (number of categories minus 1) on the y-axis and the α-level on the x-axis.
• Using the degrees of freedom and α-level from the study, you find the critical number on the chart (see example chart below).
• The critical number is used to determine statistical significance by comparing it to the test statistic.
• If the test statistic > critical value:
• The observed frequencies are far away from expected frequencies
• Reject the null hypothesis in favor of the alternative hypothesis based on this α-level.
• If the test statistic < critical value:
• The observed frequencies were close to the expected frequencies
• Do not reject the null hypothesis based on this α-level.

Example 𝝌2 test: Are jugglers more likely to be born in a particular season at a 0.05 significance level?

• There are 4 different seasons, so there are 3 degrees of freedom.
• α-level = 0.05
• Using the table above, the critical number is 7.81
• Therefore, we will reject our null hypothesis if the test statistic is > 7.81.

𝝌2 = 1.08 + 0.653 + 0.013 + 0.12 = 1.866

Since 1.866 is < 7.81 (our critical value), we need to fail to reject (i.e., accept) the null hypothesis and conclude that season of birth is not associated with juggling.

Common pitfalls Pitfalls Basics of Probability:

• Do not use chi-square unless the data are counted.
• Beware of large sample sizes, as degrees of freedom do not increase.

### Fisher’s exact test

Similar to the 𝝌2 test, the Fisher’s exact test is a statistical test used to determine whether there are nonrandom associations between 2 categorical variables.

• Used to analyze data found in contingency tables and determine the deviation of data from the null hypothesis (i.e., the p-value)
• For example: comparing 2 possible “exposures” ( smoking Smoking Willful or deliberate act of inhaling and exhaling smoke from burning substances or agents held by hand. Interstitial Lung Diseases versus not smoking Smoking Willful or deliberate act of inhaling and exhaling smoke from burning substances or agents held by hand. Interstitial Lung Diseases) with 2 possible outcomes (develops lung cancer Lung cancer Lung cancer is the malignant transformation of lung tissue and the leading cause of cancer-related deaths. The majority of cases are associated with long-term smoking. The disease is generally classified histologically as either small cell lung cancer or non-small cell lung cancer. Symptoms include cough, dyspnea, weight loss, and chest discomfort. Lung Cancer versus healthy)
• Contingency tables may have > 2 “exposures” or > 2 outcomes
• More accurate for small data sets
• Fisher’s test gives exact p-values based on the table.
• Complicated formula to calculate the test statistic, so typically calculated with software.

A 2 × 2 contingency table Contingency table A contingency table lists the frequency distributions of variables from a study and is a convenient way to look at any relationships between variables. Measures of Risk is set up like this:

The test statistic, p, is calculated from this table using the following formula:

$$p = \frac{(\frac{a+b}{a})(\frac{c+d}{c})}{(\frac{n}{a+c})} = \frac{(\frac{a+b}{b})(\frac{c+d}{d})}{(\frac{n}{b+d})} = \frac{(a+b)! (c+d)! (a+c)! (b+d)!}{a! b! c! d! n!}$$

where p = p-value; A, B, C, and D are numbers from the cells in a basic 2 × 2 contingency table Contingency table A contingency table lists the frequency distributions of variables from a study and is a convenient way to look at any relationships between variables. Measures of Risk; and n = total of A + B + C + D.

## Graphical Representation of Data

### Purpose

Before any calculations are made, data should be presented in a simple graphical format (e.g., bar graph, scatter plot, histogram Histogram Population Pyramids).

• The characteristics of the distribution of data will indicate the statistical tools that will be needed for analysis.
• Graphs are the 1st step in data analysis, allowing for the immediate visualization of distributions and patterns, which will determine the next steps of statistical analysis.
• Outliers can be an indication of mathematical or experimental errors.
• There are many ways to graphically represent data.
• After calculations are completed, visual presentation can assist the reader in conceptualizing the results.

### Displaying a relationship Relationship A connection, association, or involvement between 2 or more parties. Clinician–Patient Relationship between variables

Contingency tables:

• Tables showing the relative frequencies of different combinations of variables
• Example: Comparing the results of a screening Screening Preoperative Care test (positive or negative) with whether or not people actually have a disease. (Note: This specific type of contingency table Contingency table A contingency table lists the frequency distributions of variables from a study and is a convenient way to look at any relationships between variables. Measures of Risk can be used to calculate the sensitivity and specificity Sensitivity and Specificity Binary classification measures to assess test results. Sensitivity or recall rate is the proportion of true positives. Specificity is the probability of correctly determining the absence of a condition. Epidemiological Values of Diagnostic Tests of a screening Screening Preoperative Care test.)

Scatter diagram or dispersion Dispersion Central tendency is a measure of values in a sample that identifies the different central points in the data, often referred to colloquially as “averages.” The most common measurements of central tendency are the mean, median, and mode. Identifying the central value allows other values to be compared to it, showing the spread or cluster of the sample, which is known as the dispersion or distribution. Measures of Central Tendency and Dispersion diagrams:

• A method commonly used to display the relationship Relationship A connection, association, or involvement between 2 or more parties. Clinician–Patient Relationship between 2 numerical variables or 1 numerical variable Variable Variables represent information about something that can change. The design of the measurement scales, or of the methods for obtaining information, will determine the data gathered and the characteristics of that data. As a result, a variable can be qualitative or quantitative, and may be further classified into subgroups. Types of Variables and 1 categorical variable Variable Variables represent information about something that can change. The design of the measurement scales, or of the methods for obtaining information, will determine the data gathered and the characteristics of that data. As a result, a variable can be qualitative or quantitative, and may be further classified into subgroups. Types of Variables
• The dots represent the values of individual data points.
• Allows for calculation of a “best fit line” representing the data as a whole
• Allows for easy visualization of the entire data set
• Example: scatter diagram showing the relationship Relationship A connection, association, or involvement between 2 or more parties. Clinician–Patient Relationship between 2 numerical variables

Box plots:

• Shows the spread and centers of the data set
• Visually expresses a 5-number summary:
1. The minimum value is shown at the end of the left of the box.
2. The first quartile (Q1) is at the far left of the box.
3. The median is shown as the line in the center of the box
4. The third quartile (Q3) is at the far right of the box.
5. The maximum value is shown at the end of the right of the box.
• Typically used when comparing means and distributions between 2 populations
• Example: The following box plot compares the average incubation Incubation The amount time between exposure to an infectious agent and becoming symptomatic. Rabies Virus periods between different variants of the novel coronavirus Coronavirus Coronaviruses are a group of related viruses that contain positive-sense, single-stranded RNA. Coronavirus derives its name from “κορώνη korṓnē” in Greek, which translates as “crown,” after the small club-shaped proteins visible as a ring around the viral envelope in electron micrographs. Coronavirus (nCoV), SARS, and Middle East respiratory syndrome (MERS).

Kaplan-Meier survival curves

• A type of statistical analysis used to estimate the time-to-event data—typically, survival data.
• Commonly used in medical studies showing how a particular treatment can affect Affect The feeling-tone accompaniment of an idea or mental representation. It is the most direct psychic derivative of instinct and the psychic representative of the various bodily changes by means of which instincts manifest themselves. Psychiatric Assessment/prolong survival.
• The line represents the number of patients Patients Individuals participating in the health care system for the purpose of receiving therapeutic, diagnostic, or preventive procedures. Clinician–Patient Relationship surviving (or who have not yet achieved a certain end point) at a given point in time.
• Example: The survival curve below shows how 2 different gene Gene A category of nucleic acid sequences that function as units of heredity and which code for the basic instructions for the development, reproduction, and maintenance of organisms. Basic Terms of Genetics signatures affect Affect The feeling-tone accompaniment of an idea or mental representation. It is the most direct psychic derivative of instinct and the psychic representative of the various bodily changes by means of which instincts manifest themselves. Psychiatric Assessment survival. The study begins at time point 0, with 100% of the 2 groups surviving. Each drop-off in the line represents people dying in each group, decreasing the percentage of people who remain living. After 3 years, approximately 50% of people with the Gene Gene A category of nucleic acid sequences that function as units of heredity and which code for the basic instructions for the development, reproduction, and maintenance of organisms. Basic Terms of Genetics A signature are still alive, compared with only 5% who have the Gene Gene A category of nucleic acid sequences that function as units of heredity and which code for the basic instructions for the development, reproduction, and maintenance of organisms. Basic Terms of Genetics B signature.

### Presentation of numerical variables

Tables (a frequency table is 1 example):

• The most simple form of graphing data
• Data are displayed in columns and rows.

Histograms:

• Good for demonstrating the results of continuous data, such as:
• Weights
• Heights
• Lengths of time
• Similar to, but not the same as, bar graphs (which display categorical data)
• A histogram Histogram Population Pyramids display divides the continuous data into intervals or ranges.
• The height of each bar represents the number of data points that fall into that range.
• Because histograms are representing continuous data, they are drawn with no gaps between bars.
• Example: A histogram Histogram Population Pyramids showing how many people lost or gained weight over a 2-week study period. In this example, 1 person lost between 2.5 and 3 pounds, 27 people gained between 0 and 0.5 pounds, and 6 people gained between 1 and 1.5 pounds.

Frequency polygon charts:

### Presentation of categorical variables

Frequency tables, bar charts/histograms, and pie charts are 3 of the most common ways to present categorical data.

Frequency tables:

• Display numbers and/or percentages for each value of a variable Variable Variables represent information about something that can change. The design of the measurement scales, or of the methods for obtaining information, will determine the data gathered and the characteristics of that data. As a result, a variable can be qualitative or quantitative, and may be further classified into subgroups. Types of Variables
• Example: Pull up to 100 different stoplights and record whether the light was red, yellow, or green upon your arrival.

Bar graph:

• The length of each bar indicates the number or frequency of that variable Variable Variables represent information about something that can change. The design of the measurement scales, or of the methods for obtaining information, will determine the data gathered and the characteristics of that data. As a result, a variable can be qualitative or quantitative, and may be further classified into subgroups. Types of Variables in the data set; bars can be plotted vertically or horizontally
• Example: A bar graph showing the breakdown of race/ethnicity in Texas in 2015.

Pie charts:

• Demonstrates relative proportions between different categorical variables
• Example: The following pie chart shows the results of the European Parliament election in 2004, with each color representing a different political party and the percentage of votes they received.

## References

1. Greenhalgh, T. (2014). How to Read a Paper: The Basics of Evidence-Based Medicine. Chichester, UK: Wiley.
2. Cochran, W. G. (1952). The chi-square test of goodness of fit. Annals of Mathematical Statistics 23(3):315–345.
3. Yates, F. (1934). Contingency table involving small numbers and the χ2 test. Supplement to the Journal of the Royal Statistical Society 1(2):217–235.
4. Kale, A. (2009). Chapter 2 of Basics of Research Methodology. Essentials of Research Methodology and Dissertation Writing, 7–14.
5. Till, Y., Matei, A. (n.d.). Basics of Sampling for Survey Research. SAGE Handbook of Survey Methodology, pp. 311–328.
6. Shober, P. et al. (2018). Statistical significance versus clinical importance of observed effect sizes: what do p values and confidence intervals really represent? Anesthesia & Analgesia 126:1068–1072.
7. Katz, D. L., et al. (Eds.), Jekel’s Epidemiology, Biostatistics, Preventive Medicine, and Public Health, pp. 105–118. Retrieved July 8, 2021, from https://search.library.uq.edu.au/primo-explore/fulldisplay?vid=61UQ&search_scope=61UQ_All&tab=61uq_all&docid=61UQ_ALMA2193525390003131&lang=en_US&context=L