Skip to content

Mann-Whitney U Test

Hypothesis Testing

A non-parametric test that compares the distributions of two independent groups by ranking all observations together, without assuming normality.

When to Use

Use this test when you want to compare two independent groups but cannot assume normality, when data are ordinal, or when sample sizes are small. For example, comparing patient satisfaction ratings (on a Likert scale) between two hospitals, or comparing reaction times when the data are heavily skewed.

Assumptions

  • Observations in each group are independent.
  • The dependent variable is at least ordinal (ranks are meaningful).
  • The two groups have similarly shaped distributions if you want to interpret the test as a comparison of medians. If shapes differ, the test compares stochastic dominance.

Required Inputs

InputTypeNotes
Group 1NumericValues for the first group
Group 2NumericValues for the second group
ParameterDefaultOptions
Alternativetwo-sidedtwo-sided / less / greater

Output Metrics

MetricWhat it means
NNumber of observations in each group.
MedianMedian value in each group.
Rank SumSum of the assigned ranks in each group.
Mann-Whitney UThe Mann-Whitney U statistic.
Expected U (H0)Expected value of U if the null hypothesis is true.
Std Dev U (H0)Standard deviation of U under the null hypothesis.
ZStandardised test statistic (normal approximation).
Pr > |Z|Two-tailed p-value based on the normal approximation.
Effect Size (r)Effect size: proportion of favourable pairs minus unfavourable pairs. Ranges from -1 to +1.
Median DifferenceDifference between group medians.

Interpretation

  • If Pr > |Z| is less than alpha, the two groups differ significantly in their rank distributions.
  • The rank-biserial correlation (r) quantifies the effect size. easyCris interprets it as negligible (<0.1), small (<0.3), medium (<0.5), or large (>=0.5).
  • A positive rank-biserial r indicates that Group 1 tends to have higher values than Group 2.
  • The median difference provides a practical summary, but remember the test is based on ranks, not medians directly.
  • With very small samples, consider using an exact p-value rather than the normal approximation.

Common Pitfalls

  • The test does not compare medians unless the two distributions have the same shape. With different shapes, a significant result means one group tends to produce larger values.
  • Tied values reduce the precision of the test. The continuity correction helps but does not fully resolve the issue.
  • The normal approximation for the p-value becomes less accurate with very small sample sizes (N < 10 per group).

How It Works

  1. Combine all observations from both groups and assign ranks from smallest to largest.
  2. Sum the ranks for each group separately.
  3. Compute the U statistic as the number of times an observation from Group 1 precedes an observation from Group 2 in the combined ranking.
  4. Standardise U to a Z-score using the expected value and standard deviation under the null hypothesis, then compute the p-value from the normal distribution.

Citations

References

  • Mann, H. B., & Whitney, D. R. (1947). On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics, 18(1), 50-60.
  • Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1(6), 80-83.