Pearson Correlation

Regression & Correlation

Measures the strength and direction of the linear relationship between two continuous variables, producing a correlation coefficient (r) that ranges from -1 to +1.

When to Use

Use this test when you want to quantify how strongly two continuous variables are linearly related. For example, measuring the association between study hours and exam scores, or between temperature and ice cream sales.

Assumptions

Both variables are continuous (interval or ratio scale).
The relationship between the two variables is linear.
The data follow an approximate bivariate normal distribution.
No extreme outliers that could distort the correlation.
Observations are independent.

Required Inputs

Input	Type	Notes
Variable 1	Numeric	First continuous variable
Variable 2	Numeric	Second continuous variable

Output Metrics

Metric	What it means
Pearson r	Correlation coefficient. Ranges from -1 (perfect negative) to +1 (perfect positive). 0 indicates no linear relationship.
r-squared	Coefficient of determination: proportion of variance shared between the two variables.
t-statistic	Test statistic for the null hypothesis that r = 0.
DF	Degrees of freedom (N - 2).
p-value	P-value for the two-tailed test of r = 0.
95% CI Lower	Lower bound of the 95% confidence interval for r (Fisher z-transform).
95% CI Upper	Upper bound of the 95% confidence interval for r (Fisher z-transform).

Interpretation

The sign of r indicates the direction: positive means both variables increase together; negative means one increases as the other decreases.
Effect size thresholds for |r|: weak (0.1-0.3), moderate (0.3-0.5), strong (> 0.5). These are guidelines, not strict cutoffs.
r-squared tells you the proportion of variance shared. An r of 0.5 means r-squared = 0.25, so only 25% of the variance is shared.
The confidence interval for r is computed using the Fisher z-transformation, which is necessary because r has a bounded and skewed sampling distribution.
A significant correlation does not imply causation. Two variables can be correlated because they share a common cause.

Common Pitfalls

Pearson r only measures linear association. Two variables can have a strong non-linear relationship with r near zero.
A single outlier can dramatically inflate or deflate the correlation, especially with small samples. Always plot your data.
Restriction of range (when one variable has limited variability) attenuates the observed correlation below its true value.
Correlating aggregated data (group means rather than individual observations) produces inflated correlations (ecological fallacy).

How It Works

Standardise each variable by subtracting its mean and dividing by its standard deviation.
Compute r as the average product of the standardised scores: r = sum(z_x * z_y) / (N - 1).
Test significance with t = r * sqrt((N-2) / (1-r^2)), which follows a t-distribution with N-2 degrees of freedom.
Construct the confidence interval by transforming r to Fisher z, computing the CI on the z scale, then back-transforming to the r scale.

Citations

References

Pearson, K. (1896). Mathematical contributions to the theory of evolution. III. Regression, heredity, and panmixia. Philosophical Transactions of the Royal Society A, 187, 253-318.
Fisher, R. A. (1921). On the "probable error" of a coefficient of correlation deduced from a small sample. Metron, 1, 3-32.

Multiple Linear Regression

Spearman Rank Correlation