Cox Proportional Hazards Regression
Survival AnalysisModels the effect of one or more covariates on the hazard rate (instantaneous risk of the event) without specifying the baseline hazard function, producing hazard ratios as the primary output.
When to Use
Use this model when you want to determine which factors predict longer or shorter survival while adjusting for confounders. For example, modelling the effect of treatment, age, and tumour stage on overall survival in cancer patients.
Assumptions
- Proportional hazards: the hazard ratio between any two subjects remains constant over time. Assess with Schoenfeld residuals or log-log survival plots.
- Censoring is non-informative.
- The log-hazard is a linear function of the covariates.
- Observations (subjects) are independent.
Required Inputs
| Input | Type | Notes |
|---|---|---|
| Time to Event | Numeric | Survival times |
| Event Indicator | Binary (0/1) | 1 = event, 0 = censored |
| Covariates | Numeric / Categorical | One or more predictor variables |
Output Metrics
| Metric | What it means |
|---|---|
| Log-Likelihood | Partial log-likelihood of the fitted model. |
| AIC | Akaike Information Criterion for model comparison. |
| Concordance C-Index | Discrimination measure: probability that for a random pair of subjects, the one with the higher predicted risk experiences the event first. 0.5 = no discrimination, 1.0 = perfect. |
| Likelihood Ratio Chi-Sq | Global test comparing the model to the null model. |
| Wald Chi-Sq | Alternative global test based on Wald statistics. |
| Score Chi-Sq | Score test (efficient score) for the global null hypothesis. |
| Coefficient (log HR) | Log hazard ratio for each covariate. |
| SE | Standard error of each coefficient. |
| z | Z-statistic (coefficient / SE). |
| Pr > |z| | P-value for each covariate. |
| Hazard Ratio (HR) | Exponentiated coefficient. HR > 1 = increased risk; HR < 1 = protective effect. |
| HR 95% CL Lower | Lower confidence limit for the hazard ratio. |
| HR 95% CL Upper | Upper confidence limit for the hazard ratio. |
Interpretation
- A hazard ratio > 1 means the covariate is associated with increased risk (shorter survival). HR < 1 means reduced risk (longer survival).
- The HR confidence interval is the key result: if it includes 1.0, the covariate is not statistically significant.
- The concordance index (C-index) measures how well the model discriminates between subjects who experience the event sooner versus later. Values of 0.7-0.8 are considered good.
- Check the proportional hazards assumption. If violated, consider stratified Cox models, time-varying coefficients, or alternative models.
- Unlike logistic regression, Cox regression models the rate at which events occur (hazard), not the probability of the event.
Common Pitfalls
- Violating the proportional hazards assumption produces biased and misleading hazard ratios. Always test this assumption using Schoenfeld residuals.
- With many covariates relative to the number of events (not sample size), the model is prone to overfitting. A common guideline is at least 10 events per covariate.
- Ties in event times require special handling (Breslow or Efron method). The choice can affect results when ties are frequent.
- The hazard ratio is not a relative risk. It compares instantaneous event rates, not cumulative probabilities.
How It Works
- Model the hazard function as h(t) = h0(t) * exp(b1*X1 + b2*X2 + ...), where h0(t) is an unspecified baseline hazard.
- Estimate coefficients using partial likelihood, which considers the relative ordering of event times without specifying h0(t).
- At each event time, the partial likelihood contribution is the probability that the observed subject was the one to experience the event among all subjects still at risk.
- Exponentiate each coefficient to obtain hazard ratios for interpretation.
Citations
References
- Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society B, 34(2), 187-220.