Simple Linear Regression

Regression & Correlation

Models the linear relationship between a single continuous predictor and a continuous outcome, estimating the slope and intercept of the best-fit line.

When to Use

Use this test when you want to predict a continuous outcome from a single continuous predictor and describe their linear relationship. For example, predicting body weight from height, or estimating how enzyme activity changes with substrate concentration.

Assumptions

The relationship between predictor and outcome is linear.
Observations are independent.
Residuals (errors) are normally distributed.
Residuals have constant variance across all levels of the predictor (homoscedasticity).
No influential outliers that distort the fitted line.

Required Inputs

Input	Type	Notes
Predictor (X)	Numeric	Continuous independent variable
Outcome (Y)	Numeric	Continuous dependent variable

Output Metrics

Metric	What it means
R-Square	Proportion of variance in Y explained by X. Ranges from 0 to 1.
Adj R-Square	R-squared adjusted for the number of predictors (identical to R-squared for simple regression with one predictor).
Root MSE	Root mean squared error of the residuals. Measures typical prediction error in the units of Y.
F Value	Overall F-statistic testing whether the model explains significant variance.
Pr > F	P-value for the overall model F-test.
N Observations	Number of data points used in the model.
Intercept — Estimate	Predicted value of Y when X = 0.
Intercept — Std Error	Standard error of the intercept estimate.
Intercept — t Value	Test statistic for the intercept.
Intercept — Pr > \|t\|	P-value testing whether the intercept differs from zero.
Slope — Estimate	Change in Y for a one-unit increase in X.
Slope — Std Error	Standard error of the slope estimate.
Slope — t Value	Test statistic for the slope.
Slope — Pr > \|t\|	P-value testing whether the slope differs from zero (i.e., whether X predicts Y).
95% CL Lower	Lower bound of the 95% confidence interval for each parameter.
95% CL Upper	Upper bound of the 95% confidence interval for each parameter.
Fitted Mean	Mean of fitted values from the model.
Fitted Min	Minimum fitted value from the model.
Fitted Max	Maximum fitted value from the model.
Residual Mean	Mean of model residuals (should be near zero).
Residual SD	Standard deviation of model residuals.
Residual Min	Minimum residual value.
Residual Max	Maximum residual value.

Interpretation

R-squared tells you what fraction of the variability in Y is accounted for by X. An R-squared of 0.60 means 60% of the variation is explained.
The slope is the key result: it quantifies how much Y changes per unit change in X. If the slope is statistically significant (p < alpha), X is a significant predictor of Y.
The intercept is the predicted Y when X = 0. It may or may not be meaningful depending on whether X = 0 is within the range of your data.
Always examine a scatter plot with the fitted line and residual plots. The numbers alone cannot tell you if the linear model is appropriate.
A high R-squared does not imply causation. Regression describes association, not causal mechanisms.

Common Pitfalls

Fitting a straight line to a curved relationship produces misleading results. Always check for non-linearity in the residual plot.
Extrapolating predictions beyond the observed range of X is unreliable. The linear relationship may not hold outside your data.
A single outlier with high leverage (extreme X value) can dramatically change the slope. Check for influential points.
Correlation does not imply causation. A significant regression does not prove that X causes Y.

How It Works

Find the line Y = a + bX that minimises the sum of squared vertical distances (residuals) between the observed and predicted Y values (ordinary least squares).
The slope b = sum((Xi - mean(X)) * (Yi - mean(Y))) / sum((Xi - mean(X))^2), and the intercept a = mean(Y) - b * mean(X).
Test whether the slope is significantly different from zero using a t-test with N - 2 degrees of freedom.
Compute R-squared as 1 - (residual sum of squares / total sum of squares).

Citations

References

Legendre, A. M. (1805). Nouvelles methodes pour la determination des orbites des cometes. Firmin Didot.
Gauss, C. F. (1809). Theoria motus corporum coelestium. Perthes et Besser.

4-Parameter Logistic Dose-Response (4PL)

Binary Logistic Regression