Mediation Analysis (Hayes Model 4)

Mediation & Moderation

Tests whether the effect of an independent variable (X) on an outcome (Y) is transmitted through one or more mediating variables (M), decomposing the total effect into direct and indirect components.

When to Use

Use this analysis when you have a theoretical reason to believe that X influences Y through an intermediary mechanism M. For example, testing whether a training programme (X) improves job performance (Y) through increased self-efficacy (M), or whether drug treatment (X) reduces tumour size (Y) by suppressing a specific biomarker (M).

Assumptions

Correct causal ordering: X precedes M, which precedes Y in time or logic.
No unmeasured confounders of the X-M, M-Y, or X-Y relationships.
For continuous outcomes: linearity of all regression paths.
Residuals are independent and normally distributed (for parametric inference; bootstrap relaxes this).
Adequate sample size for bootstrap confidence intervals (N >= 50, ideally N >= 100).

Required Inputs

Input	Type	Notes
Independent Variable (X)	Numeric / Categorical	The predictor variable
Mediator (M)	Numeric / Categorical	The proposed mediating variable
Dependent Variable (Y)	Numeric / Categorical	The outcome variable
Covariates (optional)	Numeric / Categorical	Optional control variables

Parameter	Default	Options
Bootstrap Samples	5000	1000 - 10000

Output Metrics

Metric	What it means
Outcome Variable	Name of the dependent variable.
Predictor Variable	Name of the independent variable.
Mediator Variable	Name of the mediating variable.
N	Sample size.
N Bootstrap	Number of bootstrap resamples used.
Mediator Model R-squared	R-squared for the regression of M on X (path a model).
Mediator Model F	F-statistic for the mediator model.
Path a (X to M)	Coefficient for the effect of X on M. With SE, t, and p-value.
Outcome Model R-squared	R-squared for the regression of Y on X and M (paths b and c' model).
Outcome Model F	F-statistic for the outcome model.
Path b (M to Y, controlling for X)	Coefficient for the effect of M on Y, holding X constant.
Path c' (Direct Effect)	Effect of X on Y after controlling for M. With SE, t, p, and bootstrap CI.
Indirect Effect (a*b)	Product of paths a and b. The effect of X on Y transmitted through M. With bootstrap SE and CI.
Total Effect (c)	Total effect of X on Y (direct + indirect). With SE, t, and p-value.
Proportion Mediated	Indirect effect / total effect. The fraction of the total effect that passes through M. With bootstrap CI.
Sobel Test Effect	Sobel test statistic for the indirect effect (normal-theory approximation).
Sobel Test SE	Standard error for the Sobel test.
Sobel Test z	Z-statistic for the Sobel test.
Sobel Test p	P-value for the Sobel test.

Interpretation

If the bootstrap confidence interval for the indirect effect (a*b) excludes zero, mediation is statistically significant. This is the recommended test.
The Sobel test is a traditional alternative but assumes normality of the indirect effect distribution, which is often violated. Prefer the bootstrap CI.
The direct effect (c') represents the portion of X's effect on Y that does not pass through M. If c' is non-significant but the indirect effect is significant, this suggests full mediation.
Proportion mediated tells you how much of the total effect goes through the mediator. For example, a proportion of 0.40 means 40% of the effect is mediated.
A significant total effect (c) is not required for mediation to be significant. Indirect-only mediation can occur when the direct and indirect effects have opposite signs (suppression).

Common Pitfalls

Mediation analysis cannot prove causation from cross-sectional data. Longitudinal designs with temporal ordering provide stronger evidence.
Unmeasured confounders of the M-Y relationship are a major threat. Sensitivity analyses (e.g., Imai's rho) can assess how robust the findings are.
The proportion mediated is unstable when the total effect is near zero. Avoid interpreting it as a precise quantity in such cases.
Multiple mediators operating in parallel or in series require more complex models (e.g., Hayes Model 6 for serial mediation). Model 4 assumes a single mediator path.

How It Works

Fit the mediator model: regress M on X (and covariates) to obtain path a.
Fit the outcome model: regress Y on both X and M (and covariates) to obtain path b (M to Y) and path c' (direct effect of X on Y).
Compute the indirect effect as the product a * b.
Use bootstrapping: resample the data many times, re-estimate the indirect effect each time, and construct a percentile confidence interval from the bootstrap distribution.

Citations

References

Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research. Journal of Personality and Social Psychology, 51(6), 1173-1182.
Sobel, M. E. (1982). Asymptotic confidence intervals for indirect effects in structural equation models. Sociological Methodology, 13, 290-312.
Hayes, A. F. (2022). Introduction to Mediation, Moderation, and Conditional Process Analysis (3rd ed.). Guilford Press (PROCESS framework).

Nelson-Aalen Cumulative Hazard Estimator

Simple Moderation (Hayes Model 1)