Kaplan-Meier Survival Analysis

Survival Analysis

Estimates the survival function from time-to-event data, accounting for censored observations, and optionally compares survival curves between groups using the log-rank test.

When to Use

Use this analysis when you have time-to-event data with censoring and want to estimate the probability of surviving beyond a given time point. For example, estimating overall survival in a cancer trial, comparing time to relapse between treatment groups, or analysing time to equipment failure.

Assumptions

Censoring is non-informative: censored subjects have the same survival prospects as those who remain in the study at the same time point.
Survival probability depends only on time since the origin event, not on calendar time.
Events are independent across subjects.

Required Inputs

Input	Type	Notes
Time to Event	Numeric	Survival or follow-up time for each subject
Event Indicator	Binary (0/1)	1 = event occurred, 0 = censored
Group (optional)	Categorical	Optional grouping variable for comparing survival curves

Output Metrics

Metric	What it means
N	Total number of subjects in each group.
N Events	Number of subjects who experienced the event.
N Censored	Number of subjects who were censored (event not observed).
Median Survival	Time at which the survival probability first crosses 0.50. May not be estimable if fewer than 50% of subjects experienced the event.
95% CL Lower (Median)	Lower confidence limit for the median survival time.
95% CL Upper (Median)	Upper confidence limit for the median survival time.
Quartile Estimates	Survival times at which S(t) crosses 0.75 and 0.25, when estimable.
Log-Rank Chi-Square	Test statistic for comparing survival curves between groups.
Log-Rank DF	Degrees of freedom for the log-rank test (number of groups - 1).
Log-Rank Pr > Chi-Square	P-value for the log-rank test.
Survival Probability Table	Estimated S(t) at each event time, with confidence intervals.

Interpretation

The survival curve shows the estimated probability of surviving beyond each time point. A steep drop indicates many events occurring in a short period.
Median survival is the time at which 50% of subjects have experienced the event. It is the most commonly reported summary measure.
The log-rank test compares entire survival curves between groups. A significant result (p < alpha) means the groups have different survival experiences.
Confidence intervals for the survival function are typically computed using the log-log transformation, which ensures they stay within [0, 1].
Kaplan-Meier is descriptive and does not adjust for confounders. Use Cox regression to adjust for covariates.

Common Pitfalls

If censoring is informative (e.g., sicker patients drop out), the Kaplan-Meier estimate will be biased upward (overestimate survival).
Median survival cannot be estimated if fewer than half the subjects have experienced the event. Extending follow-up is the only solution.
The log-rank test has optimal power when hazards are proportional. If survival curves cross, the log-rank test may miss significant differences.
Late censoring can leave very few subjects at risk, producing unreliable estimates in the tail of the curve. Report the number at risk alongside the survival curve.

How It Works

At each event time, calculate the conditional probability of surviving past that time: (number at risk - number of events) / number at risk.
Multiply these conditional probabilities cumulatively to get S(t), the survival function.
Censored observations reduce the number at risk but do not count as events.
For the log-rank test: at each event time, compute the expected number of events in each group under the null hypothesis of equal survival, sum the (observed - expected) differences, and compare to a chi-square distribution.

Citations

References

Kaplan, E. L., & Meier, P. (1958). Nonparametric estimation from incomplete observations. Journal of the American Statistical Association, 53(282), 457-481.
Mantel, N. (1966). Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chemotherapy Reports, 50(3), 163-170.

Outlier Detection

Cox Proportional Hazards Regression