Multinomial Logistic Regression
Regression & CorrelationExtends logistic regression to outcomes with three or more unordered categories, modelling the log-odds of each category relative to a reference category.
When to Use
Use this test when your outcome has three or more unordered categories and you want to predict category membership from one or more predictors. For example, predicting tumour subtype (basal, luminal A, luminal B) from gene expression features, or predicting transport mode choice (car, bus, bicycle) from demographic variables.
Assumptions
- The dependent variable is nominal with 3+ unordered categories.
- Observations are independent.
- Independence of irrelevant alternatives (IIA): the odds of choosing category A over B should not change if category C is removed.
- No multicollinearity among predictors.
- Adequate sample size for each outcome category (at least 10 observations per predictor per category).
Required Inputs
| Input | Type | Notes |
|---|---|---|
| Outcome (Y) | Categorical (3+ levels) | Dependent variable with three or more unordered categories |
| Predictors | Numeric / Categorical | One or more independent variables |
Output Metrics
| Metric | What it means |
|---|---|
| Per-Class Coefficients (log-odds) | Log-odds coefficient for each predictor, for each category versus the reference category. |
| Per-Class Odds Ratios | Exponentiated coefficients: odds of being in a given category relative to the reference, per unit change in the predictor. |
| Accuracy | Overall proportion of correctly classified observations. |
| Precision (per class) | Proportion of predicted class members that truly belong to that class. |
| Recall (per class) | Proportion of actual class members that were correctly predicted. |
| F1 Score (per class) | Harmonic mean of precision and recall for each class. |
| Confusion Matrix | Cross-tabulation of predicted versus observed class memberships. |
Interpretation
- Each set of coefficients compares one category to the reference category. An odds ratio > 1 for predictor X in category B (vs. reference A) means that higher X increases the odds of being in category B relative to A.
- There are (k-1) sets of coefficients for k outcome categories. The reference category has all coefficients implicitly set to zero.
- Per-class precision and recall help identify which categories the model predicts well and which it confuses.
- The confusion matrix shows the most common misclassification patterns.
Common Pitfalls
- The IIA assumption can be problematic. If removing one category changes the relative odds of others, multinomial logistic regression may be inappropriate.
- With many categories or many predictors, the number of parameters grows quickly, increasing the risk of overfitting.
- Rare categories with very few observations produce unreliable coefficient estimates. Consider collapsing similar categories.
How It Works
- For each non-reference category, model the log-odds relative to the reference category as a linear combination of predictors.
- Estimate all coefficient sets simultaneously using maximum likelihood.
- Predicted probabilities for each category are computed using the softmax function applied to the linear predictors.
- Assign each observation to the category with the highest predicted probability for classification.
Citations
References
- McFadden, D. (1974). Conditional logit analysis of qualitative choice behavior. In P. Zarembka (Ed.), Frontiers in Econometrics (pp. 105-142). Academic Press.