Nelson-Aalen Cumulative Hazard Estimator
Survival AnalysisEstimates the cumulative hazard function from time-to-event data, providing an alternative perspective to the Kaplan-Meier survival curve by focusing on the accumulated risk over time.
When to Use
Use this estimator when you want to understand how risk accumulates over time rather than how survival probability decreases. It is also used as a foundation for smoothed hazard rate estimation and as an input to other survival models.
Assumptions
- Censoring is non-informative.
- Events are independent across subjects.
Required Inputs
| Input | Type | Notes |
|---|---|---|
| Time to Event | Numeric | Survival times |
| Event Indicator | Binary (0/1) | 1 = event, 0 = censored |
Output Metrics
| Metric | What it means |
|---|---|
| Time | Each event time point. |
| Cumulative Hazard | Estimated cumulative hazard H(t) at each event time. H(t) = -log(S(t)) relates to the Kaplan-Meier survival estimate. |
| 95% CI Lower | Lower confidence limit for the cumulative hazard. |
| 95% CI Upper | Upper confidence limit for the cumulative hazard. |
| Fixed Time Estimates | Cumulative hazard values at specific user-defined time points. |
| Smoothed Hazard | Kernel-smoothed instantaneous hazard rate estimate over time. |
Interpretation
- The cumulative hazard H(t) represents the total accumulated risk up to time t. It starts at 0 and increases monotonically.
- A steep segment in the cumulative hazard curve indicates a period of high event rate. A flat segment indicates a period of low risk.
- The Nelson-Aalen estimator and the Kaplan-Meier estimator are closely related: S(t) is approximately exp(-H(t)). For small hazards, they give nearly identical results.
- The smoothed hazard estimate reveals how the instantaneous risk changes over time, which the cumulative hazard and survival curves obscure.
Common Pitfalls
- The cumulative hazard can exceed 1.0 (unlike survival probability), which sometimes confuses researchers unfamiliar with hazard functions.
- Smoothed hazard estimates depend on the choice of bandwidth. Too narrow a bandwidth produces noisy estimates; too wide smooths over real features.
- With few events, the cumulative hazard estimate has wide confidence intervals, especially at later time points when few subjects remain at risk.
How It Works
- At each event time, estimate the instantaneous hazard as the number of events divided by the number at risk: d_i / n_i.
- Cumulate these hazard increments: H(t) = sum of d_i / n_i for all event times up to t.
- Compute variance using Greenwood-type formulas and construct confidence intervals.
- Optionally smooth the hazard increments using a kernel smoother to estimate the instantaneous hazard rate function.
Citations
References
- Nelson, W. (1972). Theory and applications of hazard plotting for censored failure data. Technometrics, 14(4), 945-966.
- Aalen, O. O. (1978). Nonparametric inference for a family of counting processes. Annals of Statistics, 6(4), 701-726.