Skip to content

Simple Linear Regression

Regression & Correlation

Models the linear relationship between a single continuous predictor and a continuous outcome, estimating the slope and intercept of the best-fit line.

When to Use

Use this test when you want to predict a continuous outcome from a single continuous predictor and describe their linear relationship. For example, predicting body weight from height, or estimating how enzyme activity changes with substrate concentration.

Assumptions

  • The relationship between predictor and outcome is linear.
  • Observations are independent.
  • Residuals (errors) are normally distributed.
  • Residuals have constant variance across all levels of the predictor (homoscedasticity).
  • No influential outliers that distort the fitted line.

Required Inputs

InputTypeNotes
Predictor (X)NumericContinuous independent variable
Outcome (Y)NumericContinuous dependent variable

Output Metrics

MetricWhat it means
R-SquareProportion of variance in Y explained by X. Ranges from 0 to 1.
Adj R-SquareR-squared adjusted for the number of predictors (identical to R-squared for simple regression with one predictor).
Root MSERoot mean squared error of the residuals. Measures typical prediction error in the units of Y.
F ValueOverall F-statistic testing whether the model explains significant variance.
Pr > FP-value for the overall model F-test.
N ObservationsNumber of data points used in the model.
Intercept — EstimatePredicted value of Y when X = 0.
Intercept — Std ErrorStandard error of the intercept estimate.
Intercept — t ValueTest statistic for the intercept.
Intercept — Pr > |t|P-value testing whether the intercept differs from zero.
Slope — EstimateChange in Y for a one-unit increase in X.
Slope — Std ErrorStandard error of the slope estimate.
Slope — t ValueTest statistic for the slope.
Slope — Pr > |t|P-value testing whether the slope differs from zero (i.e., whether X predicts Y).
95% CL LowerLower bound of the 95% confidence interval for each parameter.
95% CL UpperUpper bound of the 95% confidence interval for each parameter.
Fitted MeanMean of fitted values from the model.
Fitted MinMinimum fitted value from the model.
Fitted MaxMaximum fitted value from the model.
Residual MeanMean of model residuals (should be near zero).
Residual SDStandard deviation of model residuals.
Residual MinMinimum residual value.
Residual MaxMaximum residual value.

Interpretation

  • R-squared tells you what fraction of the variability in Y is accounted for by X. An R-squared of 0.60 means 60% of the variation is explained.
  • The slope is the key result: it quantifies how much Y changes per unit change in X. If the slope is statistically significant (p < alpha), X is a significant predictor of Y.
  • The intercept is the predicted Y when X = 0. It may or may not be meaningful depending on whether X = 0 is within the range of your data.
  • Always examine a scatter plot with the fitted line and residual plots. The numbers alone cannot tell you if the linear model is appropriate.
  • A high R-squared does not imply causation. Regression describes association, not causal mechanisms.

Common Pitfalls

  • Fitting a straight line to a curved relationship produces misleading results. Always check for non-linearity in the residual plot.
  • Extrapolating predictions beyond the observed range of X is unreliable. The linear relationship may not hold outside your data.
  • A single outlier with high leverage (extreme X value) can dramatically change the slope. Check for influential points.
  • Correlation does not imply causation. A significant regression does not prove that X causes Y.

How It Works

  1. Find the line Y = a + bX that minimises the sum of squared vertical distances (residuals) between the observed and predicted Y values (ordinary least squares).
  2. The slope b = sum((Xi - mean(X)) * (Yi - mean(Y))) / sum((Xi - mean(X))^2), and the intercept a = mean(Y) - b * mean(X).
  3. Test whether the slope is significantly different from zero using a t-test with N - 2 degrees of freedom.
  4. Compute R-squared as 1 - (residual sum of squares / total sum of squares).

Citations

References

  • Legendre, A. M. (1805). Nouvelles methodes pour la determination des orbites des cometes. Firmin Didot.
  • Gauss, C. F. (1809). Theoria motus corporum coelestium. Perthes et Besser.