Table of Contents
Philosophical Excursion: Philosophy of Science According to Karl Popper
The Austrian philosopher Karl Popper (1902–1994) developed the scientific theory of critical rationalism. His answer to the question concerning the frontiers of empirical research is called falsification. Falsifiability shall separate empirical from non-empirical statements.
According to that, an empirical theory must include at least 1 observation statement that can logically be contradicted. (1 of the most important characteristics of a scientific hypothesis is its falsifiability – see table on the characteristics of scientific hypotheses.) Here is an example:
- ‘Tomorrow it will snow’ – falsifiable
- ‘Tomorrow it will snow or it won’t’ – not falsifiable, but tautological (= for logical reasons always true)
Science never starts with observations (inductive), but always with assumptions (deductive).
Popper: Although observations can never prove scientific hypotheses (verification), they can demonstrate their falsehood (falsification).
Example: The observation of a black swan falsifies once and for all the hypothesis that all swans are white.
How Is a Scientific Study Conducted?
The following table gives a rough overview of the process of a scientific study. The individual topics, from forming a hypothesis to research criteria, shall be elaborated in detail:
- Defining a problem or research topic: This is usually the first step where 1 identifies research gaps and knowledge gaps to fill with the findings of the research.
- Forming a hypothesis: What is the research question? What is the hypothesis? This is the clear, concise, and main idea of your research that is pegged to the research topic.
- Operationalization: Describes how the theoretical construct can be made ‘measurable’.
- Research criteria: Quality criteria of a psychometric test: objectivity, reliability, and validity.
- Research planning: Type of investigation and its process need to be carefully planned.
- Methods of data collection: Psychological tests, interviews, systematic observations, and registering psychophysiological processes. This is the formulation of the design of your study.
- Data analysis: Analysis through statistical tests.
- Evaluation of the results: Repeatability and generalizability have to be met.
Forming Hypotheses and Theories
While in everyday life, the terms hypothesis and theory are used synonymously, in science, there are significant differences.
Hypotheses are preliminary answers to research questions, thus scientific assumptions about the correlation of 2 variables that can be empirically tested. The inductive approach derives general statements from individual observations. When hypotheses are sufficiently assured and form a system, this is called a theory.
A theory is based on hypotheses (inductive approach) and is fundamental to the derivation of hypotheses (deductive approach).
Clinical excursion: You, as a doctor, certainly formulate hypotheses during your practical work daily – suspected diagnoses. Using diagnostic methods and/or observing the effects of therapies, you try to assure these hypotheses or, employing new information, you formulate or modify new hypotheses.
Important characteristics of scientific hypotheses (by Bortz & Döring 2005):
- Empirical testability: Scientific hypotheses must include real facts that can be empirically investigated.
- Formulating conditional sentences: Scientific hypotheses have to be at least implicitly based on an intelligible statement of the form ‘if x, then y’ or of the form ‘the more x, the more y’.
- Generalizability and degree of generality: Scientific hypotheses need to make statements beyond an individual case or a singular event.
- Falsifiability: Scientific hypotheses must be disprovable (falsifiable) and must not be formulated in a universally valid way (tautologies).
- Power of prediction: hypothesis must be in a state of predicting the future using the findings in the study.
- Simple with no complexity.
- It must be conceptually clear and free of ambiguity.
Formulating hypotheses:
- Probabilistic hypotheses: They represent the most common hypotheses in psychological research. They deal with assumptions about probabilities that an event occurs under certain circumstances. They comprise statements about correlations and means that are characteristic for the totality of events. E.g., smoking is a risk factor for cardiovascular diseases.
- Deterministic hypotheses: These hypotheses are not limited in space and time. They represent concrete factual statements. E.g., if I drop an item on earth, it will fall.
- Difference hypotheses: They are tested by comparing frequencies or means and assert a difference between at least 2 populations and 1 variable. E.g., smokers have a higher risk of lung cancer than non-smokers.
- Correlation hypotheses: They assert a correlation between at least 2 variables. E.g., nutrition is correlated with education.
- Null hypothesis (H0) and the alternative hypothesis (H1): The alternative hypothesis is the 1 that a scientist would like to support. The opposing hypothesis, the null hypothesis, is formulated. The formulation of this null hypothesis represents the principle of falsification.
Alpha error and beta error
There are 2 kinds of errors:
Alpha error (type 1 error) | Beta error (type 2 error) |
H0 is rejected, even though this hypothesis was correct | H1 is rejected, even though this hypothesis was correct |
Construct and Operationalization: How Can the Theoretical Construct Be Made Measurable?
Measuring the heart rate is by far easier than identifying thoughts or feelings. Operationalization enables measurement of a phenomenon that is not directly measurable. To measure constructs, variables are required.
Variables are features of the characteristics that are investigated (e.g., male-female). The opposite of a variable is a constant.
Scaling and construction of indices
Scales are reference systems for measuring the expression of characteristics (qualitatively ‘either-or’ or quantitatively ‘gradations’). For an overview of important terms about scales, refer to the following table:
Term | Description | Example |
Rank order | Regarding the characteristic, persons are ordered hierarchically | – |
Pairwise comparison | Making pairwise comparisons | Comparison of eyeglass lenses |
Rating scale | Gradations between the extreme poles | Satisfaction scales (‘very satisfied’ to ‘very dissatisfied’) |
Likert scale | Rating scale with usually 5 points, the overall score of the test is added up at the end | Anxiety scales |
Polarity profile | Measuring the association of opposite pairs | Detecting stereotypes and attitudes (e.g., towards homosexuality) |
Visual analog scale | Likert scales with a continuum instead of poles and gradations between them | Pain scales |
Guttman scale | Statements are sorted in a specific order (from ‘normal’ to ‘extreme’). If a statement is affirmed, usually all preceding statements are affirmed as well? In this case, the Guttman scale is ‘perfect’. | Specifying weight: > 60 kg (132.2 lb), > 70 kg (154.3 lb), > 80 kg (176.3 lb), etc. |
Scales of measurement
- Nominal scale: Lowest level of scales, only statements about equality or inequality.
- Ordinal scale: Objects are arranged in order without fixed intervals; they can also have the same rank position.
- Interval scale: Additionally, there are equal intervals between the rank positions, and the 0 points can be determined.
- Ratio scale: The highest level of scales – statements about equality/inequality of sums, quotients, etc. can be made. The geometric mean can be calculated.
The following table gives an overview with examples of the scales of measurement:
NONMETRIC SCALES | METRIC SCALES | |||
Nominal scale | Ordinal scale | Interval scale | Ratio scale | |
Data properties | Simple assignment | Rank order | Equal interval of units | True 0 point |
Statistical measures | Mode, frequency distribution | Additionally: median, quartiles, percentiles, and range | Additionally: arithmetic mean, standard deviation, and kurtosis | Additionally: geometric mean |
Statistical procedures | Chi-square, contingency tables | Nonparametric procedures | Parametric procedures | Parametric procedures |
Examples | Gender, religion, and family status | Footrace rankings, school grades, and education | The temperature in Celsius and intelligence scores | The temperature in Kelvin, time, and length |
Test Criteria
What constitutes a psychological test, and according to which quality criteria is it developed? We will address this matter in the following section, especially the quality criteria, as it is a popular topic in your exams.
Test standardization
How can an individual test value be assessed without any standards of comparison?
For the assessment, whether a test yielded above average, below average or average results, a mean and standard deviation of a comparative population (norm sample) are necessary. This calibration should be performed with a sample as large as possible and under standardized conditions.
The average test performance of the reference group makes the norm, the obtained average score is the mean, and the measurement for the variance of test scores is the standard deviation.
Test theoretical quality criteria: objectivity, reliability, and validity
The 3 most important test quality criteria objectivity, reliability, and validity build on 1 another, meaning without objectivity there is no reliability and without reliability, there is no validity.
Objectivity – Does the test depend on the investigator?
Subjective influences by the investigator shall be minimized by standardization and limited scope during the evaluation. The correlation coefficient indicates to what degree the test result depends on the investigator.
Reliability – How high is the accuracy of the measurement of the test?
A test can be classified as accurate when it yields an identical or very similar result under the same condition for the same person. What possibilities are available for testing the reliability of a test?
- Test-retest reliability: The name says it all: repetition. The same test is repeatedly applied to the same proband.
- Parallel-forms reliability: Not the same test is repeated, but a parallel form, to avoid memorization effects.
- Consistency analysis: The test is applied only once. For split-half reliability, the test is split in half and the scores of 1 half are compared with those of the other half. When each single test item is correlated with all others, this is called internal consistency.
It is impossible to gain absolute measurement accuracy with a psychological test, because of a standard measurement error due to a lack of reliability. Here, 2 factors are taken into consideration:
- Correlation coefficient: measure for measurement accuracy
- Standard deviation: measure for the variation in the distribution of test scores
The confidence interval is calculated: the test score of the proband +/- standard error of the mean = true value of the proband (usually 95%).
Validity – Does the test measure the characteristic that it purports to measure?
There are 2 different kinds of validation:
- Internal validation: The test is considered by itself (internally). The variation of the dependent variable is caused by the variation of the independent variable. The results either support or reject the hypothesis.
- External validation: For this purpose, external objects are taken into account. The results of the study can be generalized and extended to other situations/populations. When significant differences between groups of people occur, this is called the discriminant validity. A study holds predictive validity when a test leads to predictions.
Quality criteria: sensitivity, specificity, and predictive values
Using a psychological test, you would like to assign a person to a category (e.g., patient with depression, ADHD or the like). For making a medical diagnosis, you conduct specific tests and come to a diagnostic decision based on the results and your expert judgment.
The risk of error of such a decision is high. To evaluate such a decision, theory, sensitivity, specificity, and predictive value come into play. This way, diagnostic approaches are evaluated for their usefulness.
Verification procedures: Does the diagnostic approach make sense?
Quality and usefulness of a test are examined as follows: a large sample with a positive and a negative category is subjected to a verification procedure and hence tested for their accurate classification, e.g., the human immunodeficiency virus (HIV) antibody test is checked via the very time-consuming, costly Western blot.
- Sensitivity: Sensitivity indicates how many people are sick that are identified as positive by the test (e.g., breast cancer screening).
- Specificity: Specificity indicates how many people are healthy that are identified as negative by the test.
- Positive predictive value: The probability of identifying a person as positive who is sick.
- Negative predictive value: The probability of identifying a person as negative who is healthy.
With a 2 x 2 schematic of possible decisions, the specific values can be easily calculated:
Actual condition | |||
Diagnosis | Positive (sick) | Negative (healthy) | Total |
Positive (sick) | Decision true-positive A | Decision false-positive B | Positive predictive value A / (A + B) |
Negative (healthy) | Decision false-negative C | Decision true-negative D | Negative predictive value D / (C + D) |
Total | Sensitivity A / (A + C) | Specificity D / (B + D) |