Multiple Linear Regression
Regression & CorrelationModels a continuous outcome as a linear function of two or more predictors, estimating the independent contribution of each predictor while controlling for the others.
When to Use
Use this test when you want to predict a continuous outcome from two or more predictors and understand the unique contribution of each. For example, predicting blood pressure from age, BMI, and sodium intake simultaneously, or modelling crop yield from temperature, rainfall, and fertiliser application.
Assumptions
- The relationship between each predictor and the outcome is linear (after controlling for other predictors).
- Observations are independent.
- Residuals are normally distributed.
- Residuals have constant variance (homoscedasticity).
- No perfect multicollinearity among predictors (VIF < 10 as a guideline).
Required Inputs
| Input | Type | Notes |
|---|---|---|
| Outcome (Y) | Numeric | Continuous dependent variable |
| Predictors (X1, X2, ...) | Numeric | Two or more continuous independent variables |
Output Metrics
| Metric | What it means |
|---|---|
| R-Square | Proportion of variance in Y explained by all predictors combined. |
| Adj R-Square | R-squared adjusted for the number of predictors. Penalises adding non-informative predictors. |
| Root MSE | Root mean squared error of residuals. |
| F Value | Overall F-statistic testing whether the model explains significant variance. |
| Pr > F | P-value for the overall F-test. |
| N Observations | Number of data points. |
| Estimate | Estimated regression coefficient for each predictor (and the intercept). |
| Std Error | Standard error of each coefficient. |
| t Value | Test statistic for each coefficient. |
| Pr > |t| | P-value testing whether each coefficient differs from zero. |
| 95% CL Lower | Lower confidence limit for each coefficient. |
| 95% CL Upper | Upper confidence limit for each coefficient. |
| VIF | Variance Inflation Factor for each predictor. VIF > 10 suggests problematic multicollinearity. |
| Fitted Mean | Mean of fitted values from the model. |
| Fitted Min | Minimum fitted value from the model. |
| Fitted Max | Maximum fitted value from the model. |
| Residual Mean | Mean of model residuals (should be near zero). |
| Residual SD | Standard deviation of model residuals. |
| Residual Min | Minimum residual value. |
| Residual Max | Maximum residual value. |
Interpretation
- Each coefficient represents the expected change in Y for a one-unit increase in that predictor, holding all other predictors constant.
- Adjusted R-squared is more appropriate than R-squared for comparing models with different numbers of predictors.
- The overall F-test tells you whether the set of predictors as a whole explains significant variance. Individual t-tests tell you which predictors contribute uniquely.
- VIF values flag multicollinearity. A VIF of 5 means that predictor's variance is inflated 5-fold due to correlation with other predictors.
- Standardised coefficients (beta weights) allow comparing the relative importance of predictors measured on different scales.
Common Pitfalls
- Adding too many predictors relative to sample size leads to overfitting. A common guideline is at least 10-20 observations per predictor.
- Multicollinearity does not bias coefficients but inflates their standard errors, making individual predictors appear non-significant even when the overall model is strong.
- Omitting an important predictor (omitted variable bias) can make included predictors appear significant when they are merely proxies for the omitted variable.
- Non-linear relationships will produce biased estimates if modelled as linear. Check residual plots for curvature.
How It Works
- Model: Y = b0 + b1*X1 + b2*X2 + ... + bk*Xk + error.
- Find the coefficient vector that minimises the sum of squared residuals using ordinary least squares (matrix solution: b = (X'X)^(-1) X'Y).
- Test each coefficient with a t-test and the overall model with an F-test.
- Compute VIF for each predictor as 1 / (1 - R-squared of that predictor regressed on all other predictors).
Citations
References
- Gauss, C. F. (1809). Theoria motus corporum coelestium. Perthes et Besser.