Skip to content

Cox Proportional Hazards Regression

Survival Analysis

Models the effect of one or more covariates on the hazard rate (instantaneous risk of the event) without specifying the baseline hazard function, producing hazard ratios as the primary output.

When to Use

Use this model when you want to determine which factors predict longer or shorter survival while adjusting for confounders. For example, modelling the effect of treatment, age, and tumour stage on overall survival in cancer patients.

Assumptions

  • Proportional hazards: the hazard ratio between any two subjects remains constant over time. Assess with Schoenfeld residuals or log-log survival plots.
  • Censoring is non-informative.
  • The log-hazard is a linear function of the covariates.
  • Observations (subjects) are independent.

Required Inputs

InputTypeNotes
Time to EventNumericSurvival times
Event IndicatorBinary (0/1)1 = event, 0 = censored
CovariatesNumeric / CategoricalOne or more predictor variables

Output Metrics

MetricWhat it means
Log-LikelihoodPartial log-likelihood of the fitted model.
AICAkaike Information Criterion for model comparison.
Concordance C-IndexDiscrimination measure: probability that for a random pair of subjects, the one with the higher predicted risk experiences the event first. 0.5 = no discrimination, 1.0 = perfect.
Likelihood Ratio Chi-SqGlobal test comparing the model to the null model.
Wald Chi-SqAlternative global test based on Wald statistics.
Score Chi-SqScore test (efficient score) for the global null hypothesis.
Coefficient (log HR)Log hazard ratio for each covariate.
SEStandard error of each coefficient.
zZ-statistic (coefficient / SE).
Pr > |z|P-value for each covariate.
Hazard Ratio (HR)Exponentiated coefficient. HR > 1 = increased risk; HR < 1 = protective effect.
HR 95% CL LowerLower confidence limit for the hazard ratio.
HR 95% CL UpperUpper confidence limit for the hazard ratio.

Interpretation

  • A hazard ratio > 1 means the covariate is associated with increased risk (shorter survival). HR < 1 means reduced risk (longer survival).
  • The HR confidence interval is the key result: if it includes 1.0, the covariate is not statistically significant.
  • The concordance index (C-index) measures how well the model discriminates between subjects who experience the event sooner versus later. Values of 0.7-0.8 are considered good.
  • Check the proportional hazards assumption. If violated, consider stratified Cox models, time-varying coefficients, or alternative models.
  • Unlike logistic regression, Cox regression models the rate at which events occur (hazard), not the probability of the event.

Common Pitfalls

  • Violating the proportional hazards assumption produces biased and misleading hazard ratios. Always test this assumption using Schoenfeld residuals.
  • With many covariates relative to the number of events (not sample size), the model is prone to overfitting. A common guideline is at least 10 events per covariate.
  • Ties in event times require special handling (Breslow or Efron method). The choice can affect results when ties are frequent.
  • The hazard ratio is not a relative risk. It compares instantaneous event rates, not cumulative probabilities.

How It Works

  1. Model the hazard function as h(t) = h0(t) * exp(b1*X1 + b2*X2 + ...), where h0(t) is an unspecified baseline hazard.
  2. Estimate coefficients using partial likelihood, which considers the relative ordering of event times without specifying h0(t).
  3. At each event time, the partial likelihood contribution is the probability that the observed subject was the one to experience the event among all subjects still at risk.
  4. Exponentiate each coefficient to obtain hazard ratios for interpretation.

Citations

References

  • Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society B, 34(2), 187-220.