Table of Contents
Philosophical Excursion: Philosophy of Science According to Karl Popper
The Austrian philosopher Karl Popper (1902 – 1994) developed the scientific theory of critical rationalism. His answer to the question concerning the frontiers of empirical research is called falsification. Falsifiability shall separate empirical from non-empirical statements.
According to that, an empirical theory must include at least one observation statement that can logically be contradicted. (One of the most important characteristics of a scientific hypothesis is its falsifiability – see table on the characteristics of scientific hypotheses.) Here is an example:
- “Tomorrow it will snow.” – falsifiable
- “Tomorrow it will snow or it won’t.” – not falsifiable, but tautological (= for logical reasons always true)
Science never starts with observations (inductive), but always with assumptions (deductive).
Popper: Although observations can never prove scientific hypotheses (verification), they can demonstrate their falsehood (falsification).
Example: The observation of a black swan falsifies once and for all the hypothesis that all swans are white.
How Is a Scientific Study Conducted?
The following table gives a rough overview of the process of a scientific study. The individual topics, from forming a hypothesis to research criteria, shall be elaborated in detail:
- Defining a problem or research topic: this is usually the first step where one identifies research gaps and knowledge gap to fill with the findings of the research.
- Forming a hypothesis: What is the research question? What is the hypothesis? This is a clear, concise, and main idea of your research that is pegged to the research topic.
- Operationalization: Describes how the theoretical construct can be made “measurable.”
- Research criteria: Quality criteria of a psychometric test: objectivity, reliability, validity.
- Research planning: Type of investigation and its process need to be carefully planned.
- Methods of data collection: Psychological tests, interviews, systematic observations, registering psychophysiological processes. This is a formulation of a design of your study.
- Data analysis: Analysis through statistical tests.
- Evaluation of the results: Repeatability and generalizability have to be met.
Forming Hypotheses and Theories
While in everyday life, the terms hypothesis and theory are used synonymously; in science, there are significant differences.
Hypotheses are preliminary answers to research questions, thus scientific assumptions about the correlation of two variables that can be empirically tested. The inductive approach derives general statements from individual observations. When hypotheses are sufficiently assured and form a system, this is called a theory.
A theory is based on hypotheses (inductive approach) and is fundamental to the derivation of hypotheses (deductive approach).
Clinical excursion: You, as a doctor, certainly formulate hypotheses during your practical work on a daily basis, suspected diagnoses. Using diagnostic methods and/or observing the effects of therapies, you try to assure these hypotheses or, by means of new information, you formulate or modify new hypotheses.
Important characteristics of scientific hypotheses (by Bortz & Döring 2005):
- Empirical testability: Scientific hypotheses must include real facts that can be empirically investigated.
- Formulating conditional sentences: Scientific hypotheses have to be at least implicitly based on an intelligible statement of the form ‘if x, then y’ or of the form ‘the more x, the more y.’
- Generalizability and degree of generality: Scientific hypotheses need to make statements beyond an individual case or a singular event.
- Falsifiability: Scientific hypotheses must be disprovable (falsifiable) and must not be formulated in a universally valid way (tautologies).
- Power of prediction: a hypothesis must be in a state of predicting the future using the findings in the study.
- Simple with no complexity.
- Must be conceptually clear and free of ambiguity.
Formulating hypotheses:
- Probabilistic hypotheses: They represent the most common hypotheses in psychological research. They deal with assumptions about probabilities that an event occurs under certain circumstances. They comprise statements about correlations and means that are characteristic for the totality of events. Example: Smoking is a risk factor for cardiovascular diseases.
- Deterministic hypotheses: These hypotheses are not limited in space and time. They represent concrete factual statements. Example: If I drop an item on earth, it will fall down.
- Difference hypotheses: They are tested by comparing frequencies or means and assert a difference between at least two populations with respect to one variable. Example: Smokers have a higher risk of lung cancer than non-smokers.
- Correlation hypotheses: They assert a correlation between at least two variables. Example: Nutrition is correlated with education.
- Null hypothesis (H0) and the alternative hypothesis (H1): The alternative hypothesis is the one that a scientist would like to support. The opposing hypothesis, the null hypothesis, is formulated. Formulation of this null hypothesis represents the principle of falsification.
Alpha error and beta error
There are two kinds of errors:
Alpha error (type 1 error) | Beta error (type 2 error) |
H0 is rejected, even though this hypothesis was correct | H1 is rejected, even though this hypothesis was correct. |
Construct and Operationalization: How Can the Theoretical Construct Be Made Measurable?
Measuring the heart rate is by far easier than identifying thoughts or feelings. Operationalization enables measurement of a phenomenon that is not directly measurable. In order to measure constructs, variables are required.
Variables are features of the characteristics that are investigated (example: male-female). The opposite of a variable is a constant.
Scaling and construction of indices
Scales are reference systems for measuring the expression of characteristics (qualitatively ‘either-or’ or quantitatively ‘gradations’). For an overview of important terms about scales, refer to the following table:
Term: | Description: | Example: |
Rank order | Regarding the characteristic, persons are ordered hierarchically | – |
Pairwise comparison | Making pairwise comparisons | Comparison of eyeglass lenses |
Rating scale | Gradations between the extreme poles | Satisfaction scales (‘very satisfied’ to ‘very dissatisfied’) |
Likert scale | Rating scale with usually 5 points, the overall score of the test is added up at the end | Anxiety scales |
Polarity profile | Measuring the association of opposite pairs | Detecting stereotypes and attitudes (e.g. towards homosexuality) |
Visual analog scale | Likert scales with a continuum instead of poles and gradations between them | Pain scales |
Guttman scale | Statements are sorted in a specific order (from ‘normal’ to ‘extreme’). Is a statement affirmed, usually all preceding statements are affirmed as well? In this case, the Guttman scale is ‘perfect.’ | Specifying weight: >60kg, >70kg, >80kg etc. |
Scales of measurement
- Nominal scale: Lowest level of scales, only statements about equality or inequality.
- Ordinal scale: Objects are arranged in order without fixed intervals; they can also have the same rank position.
- Interval scale: Additionally, there are equal intervals between the rank positions, and the zero points can be determined.
- Ratio scale: Highest level of scales – statements about equality/inequality of sums, quotients, etc. can be made. The geometric mean can be calculated.
The following table gives an overview with examples of the scales of measurement:
NONMETRIC SCALES | METRIC SCALES | |||
Nominal scale | Ordinal scale | Interval scale | Ratio scale | |
Data properties | Simple assignment | Rank order | Equal interval of units | True zero point |
Statistical measures | Mode, frequency distribution | Additionally: median, quartiles, percentiles, range | Additionally: Arithmetic mean, standard deviation, kurtosis | Additionally: geometric mean |
Statistical procedures | Chi-square, contingency tables | Nonparametric procedures | Parametric procedures | Parametric procedures |
Examples | Gender, religion, family status | Footrace rankings, school grades, education | The temperature in °C, intelligence scores | The temperature in Kelvin, time, length |
Source: M. Schön (2007): GK1 Medizinische Psychologie und Soziologie, p. 25, Fig. 1.4., Publisher Springer.
Test Criteria
What constitutes a psychological test, and according to which quality criteria is it developed? We will address this matter in the following section, especially the quality criteria, as it is a popular topic in your exams.
Test standardization
How can an individual test value be assessed without any standards of comparison?
For the assessment, whether a test yielded above average, below average or average results, a mean and standard deviation of a comparative population (norm sample) are necessary. This calibration should be performed with a sample as large as possible and under standardized conditions.
The average test performance of the reference group makes the norm, the obtained average score is the mean, and the measurement for the variance of test scores is the standard deviation.
Test theoretical quality criteria: Objectivity, reliability, and validity
The three most important test quality criteria objectivity, reliability, and validity build on one another, meaning without objectivity there is no reliability and without reliability, there is no validity.
Objectivity – Does the test depend on the investigator?
Subjective influences by the investigator shall be minimized by standardization and limited scope during the evaluation. The correlation coefficient indicates to what degree the test result depends on the investigator.
Reliability – How high is the accuracy of measurement of the test?
A test can be classified as accurate when it yields an identical or very similar result under the same condition for the same person. What possibilities are available for testing the reliability of a test?
- Test-retest reliability: The name says it all: Repetition. The same test is repeatedly applied to the same proband.
- Parallel-forms reliability: Not the same test is repeated, but a parallel form, in order to avoid memorization effects.
- Consistency analysis: The test is applied only once. For split-half reliability, the test is split in half and the scores of one-half are compared with those of the other half. When each single test item is correlated with all others, this is called internal consistency.
It is impossible to gain absolute measurement accuracy with a psychological test, because of a standard measurement error due to lacking reliability. Here, two factors are taken into consideration:
- Correlation coefficient: Measure for measurement accuracy
- Standard deviation: Measure for the variation in the distribution of test scores
The confidence interval is calculated: Test score of the proband +/- standard error of the mean = true value of the proband (usually 95%).
Validity – Does the test actually measure the characteristic that it purports to measure?
There are two different kinds of validation:
- Internal validation: The test is considered by itself (internally). Variation of the dependent variable is clearly caused by variation of the independent variable. The results either support or reject the hypothesis.
- External validation: For this purpose, external objects are taken into account. The results of the study can be generalized and extended to other situations/populations. When significant differences between groups of people occur, this is called discriminant validity. A study holds predictive validity when a test leads to predictions.
Quality criteria: Sensitivity, specificity and predictive values
By means of a psychological test, you would like to assign a person to a category (e.g. patient with depression, ADHD or the like). For making a medical diagnosis, you conduct specific tests and come to a diagnostic decision on the basis of the results and your expert judgment.
The risk of error of such a decision is high. In order to evaluate such a decision theory, sensitivity, specificity, and predictive value come into play. This way, diagnostic approaches are evaluated with regard to their usefulness.
Verification procedures: Does the diagnostic approach make sense?
Quality and usefulness of a test are examined as follows: A large sample with a positive and a negative category is subjected to a verification procedure and hence tested for their accurate classification, e.g. the HIV antibody test is checked via the very time-consuming, costly Western blot.
- Sensitivity: Sensitivity indicates how many people are actually sick that are identified as positive by the test (e.g. breast cancer screening).
- Specificity: Specificity indicates how many people are actually healthy that are identified as negative by the test.
- Positive predictive value: The probability of identifying a person as positive who is actually sick.
- Negative predictive value: The probability of identifying a person as negative who is actually healthy.
With a 2×2 schematic of possible decisions, the specific values can be easily calculated:
Actual condition | |||
Diagnosis | Positive (sick) | Negative (healthy) | Total |
Positive (sick) | Decision true positive A | Decision false-positive B | Positive predictive value A/A+B |
Negative (healthy) | Decision false-negative C | Decision true negative D | Negative predictive value D/(C+D) |
Total | Sensitivity A/(A+C) | Specificity D/(B+D) |
Source: [3] S. Rothgangel (2010): Kurzlehrbuch Medizinische Psychologie und Soziologie, p. 154, Fig. 4.3) Publisher Thieme.
Review Questions
The answers can be found below the references.
1. Reaction time is measured in an experiment and the measurement values are specified in seconds. On which scale does the measurement occur?
- Interval scale
- Nominal scale
- Ordinal scale
- Ratio scale
- Relative judgment scale
2. A practicing doctor notices in her practice that many patients who receive medication Y complain about nausea. She concludes that Y causes nausea as a side effect. Which term best describes this procedure?
- Deduction on the basis of a clinical observation
- Deduction on the basis of a systematic testing
- Induction on the basis of a clinical observation
- Induction on the basis of a systematic testing
- Verification on the basis of a clinical observation
3. Reliability indicates the measurement accuracy and how reliable a test is, respectively. Which procedure doesn’t measure reliability?
- Inter-rater reliability
- Parallel-forms reliability
- Test of internal consistency
- Test-retest reliability
- Split-half reliability