Why do taller people tend to weigh more? Why does income tend to rise with education level? Why do students who score well on mid-terms tend to score well on finals? The answer lies in correlation — one of the most foundational, debated, and philosophically rich concepts in all of empirical science. At its most precise, it is measured by Pearson's r, the product-moment correlation coefficient.
The "Temperature and Ice Cream" Analogy 🍦
Suppose a researcher notices that on hotter days, more ice cream is sold. They also notice that on those same hot days, more people drown at beaches. Should we conclude that eating ice cream causes drowning? Of course not. Both are driven by a third variable: hot weather brings people to beaches and also increases ice cream consumption. This is the single most important lesson in correlation: r tells you that two variables move together. It does not — cannot — tell you why. Understanding this distinction is the difference between science and superstition.
I. The Philosophical Foundation
1.1 The Problem of Co-variation: Hume's Legacy
The idea that two things are "related" has ancient roots, but it was the Scottish philosopher David Hume (1711–1776) who first articulated the problem precisely. In his Enquiry Concerning Human Understanding, Hume asked: how do we know that because A and B have occurred together in the past, they will continue to do so in the future? This is the problem of induction.
Pearson's r is, in a deep sense, a formalization of Hume's insight: it tells us the degree to which two variables have co-varied in our observed sample — and through hypothesis testing, whether that co-variation is strong enough to be unlikely to have arisen by pure chance. But it cannot peer behind the curtain of causality.
1.2 Francis Galton and the Birth of Correlation
Sir Francis Galton (1822–1911) first described the concept of co-relation in his study of heredity in the 1880s. He noted that traits like height and arm span "co-relate" across generations. His student, Karl Pearson (1857–1936), formalized this into the mathematical coefficient we use today, publishing his product-moment correlation formula in 1895. The collaboration between Galton, Pearson, and later Ronald Fisher gave us not just the formula, but the inferential framework: the t-test for significance, which tells us whether r is large enough to trust.
1.3 What Correlation Actually Measures
Pearson's r measures the strength and direction of the linear relationship between two continuous variables. It is bounded between −1 and +1. Three things to always remember:
- r = +1: Perfect positive linear relationship. As X increases by one unit, Y increases by a perfectly proportional amount. All points fall exactly on a line with positive slope.
- r = 0: No linear relationship. X and Y are linearly unrelated. (They could still have a non-linear relationship — e.g., a U-shaped curve.)
- r = −1: Perfect negative linear relationship. As X increases, Y decreases in perfect proportion.
⚠️ The Non-Linearity Trap
Pearson's r only captures linear association. A dataset where Y = X² will show r ≈ 0 even though Y is perfectly determined by X. Always visualise your data with a scatterplot before interpreting r. An r near zero does NOT necessarily mean "no relationship" — it means "no linear relationship."
II. The Mathematics
2.1 The Pearson r Formula
2.2 Why Standardise? The Dimensionless Nature of r
One of Pearson's r most elegant properties is that it is dimensionless — it has no units. Whether you are correlating height in metres with weight in kilograms, or exam scores with hours studied, r always falls between −1 and +1. This is because the formula divides the covariance by the product of the standard deviations, effectively "standardising" both variables. You are measuring not how much the variables change together, but how consistently they change together relative to their own variability.
2.3 Relationship Between r and R² (Coefficient of Determination)
III. Statistical Significance: The t-Test for r
A correlation coefficient computed from a sample is a statistic estimating the population parameter ρ (rho). The question of significance asks: is our observed r large enough to conclude that ρ ≠ 0?
IV. Confidence Intervals: Fisher's z-Transformation
Because r is bounded between −1 and +1, its sampling distribution is not normal — especially for large |r|. Ronald Fisher (1915) introduced the z-transformation to normalise it:
V. The Assumptions
1. Interval/Ratio Scale
Both variables must be measured on an interval or ratio scale. Pearson r cannot be used with ordinal data — use Spearman's ρ instead.
2. Linearity
The relationship between X and Y must be linear (or approximately so). Always check with a scatterplot before computing r.
3. Bivariate Normality
Both variables should be approximately normally distributed for the t-test p-value to be valid. Robust with n ≥ 30 by the Central Limit Theorem.
4. Absence of Outliers
A single outlier can dramatically inflate or deflate r. Inspect the scatterplot. Pearson r is not resistant to outliers — this is its principal weakness.
5. Homoscedasticity
The variance of Y should be approximately equal across all values of X (and vice versa). A fan-shape in the scatterplot indicates heteroscedasticity.
6. Independence
Each pair of observations (xᵢ, yᵢ) must be independent of all other pairs. Violated by repeated measures or time-series data.
VI. Interpreting the Magnitude of r
Statistical significance and practical significance are not the same thing. A very large sample can make a trivially small r statistically significant. Always interpret the magnitude of r alongside the p-value.
| |r| Range | Cohen (1988) Label | R² Range | Practical Interpretation |
|---|---|---|---|
| .00 – .09 | Negligible | < .01 | Variables share virtually no linear variance. Relationship is trivially small. |
| .10 – .29 | Small | .01 – .08 | Weak but detectable relationship. Meaningful in large epidemiological studies. |
| .30 – .49 | Medium | .09 – .24 | Moderate relationship. Practically meaningful in most applied research contexts. |
| .50 – .69 | Large | .25 – .48 | Strong relationship. Variables share substantial variance. Easily observable. |
| .70 – .89 | Very Large | .49 – .79 | Very strong association. Common in psychometrics and measurement validation. |
| .90 – 1.00 | Near Perfect | .81 – 1.00 | Extremely strong. Rare in social sciences; common in physical measurement. |
✅ The Logic of the Test: What "Significant" Really Means
When we say r = .42 is "significant at p < .05," we are saying: if the true population correlation were zero (ρ = 0), the probability of observing an r as large as .42 in a sample of this size is less than 5%. We are not saying "the correlation is strong" or "X causes Y." We are saying the correlation is probably not zero in the population. Significance tells you about sampling reliability, not practical importance.