Skip to content

Normality Tests

Distribution & Descriptive

Runs five complementary normality tests simultaneously (Shapiro-Wilk, Lilliefors, Anderson-Darling, Cramer-von Mises, and Jarque-Bera) to assess whether data follow a normal distribution.

When to Use

Use these tests before applying parametric methods that assume normality (t-tests, ANOVA, Pearson correlation, linear regression). For example, checking whether residuals from a regression model are normally distributed, or whether continuous measurements from a clinical trial can be analysed with a t-test.

Assumptions

  • The data are a random sample from a population.
  • Observations are independent.
  • The variable is continuous.

Required Inputs

InputTypeNotes
ValuesNumericContinuous data to test for normality

Output Metrics

MetricWhat it means
Shapiro-Wilk WTest statistic. Values close to 1 indicate normality. Most powerful for small samples (N < 50).
Shapiro-Wilk p-valueP-value for the Shapiro-Wilk test.
Kolmogorov-Smirnov DMaximum absolute difference between empirical and theoretical normal CDFs. In easyCris this test uses the Lilliefors correction for estimated parameters.
Kolmogorov-Smirnov p-valueP-value for the Kolmogorov-Smirnov test as displayed in the UI; the backend method uses the Lilliefors correction.
Anderson-Darling A-squaredWeighted measure of departure from normality, emphasising tail behaviour.
Anderson-Darling p-valueP-value for the Anderson-Darling test.
Cramer-von Mises W-squaredIntegral of the squared difference between empirical and theoretical CDFs.
Cramer-von Mises p-valueP-value for the Cramer-von Mises test.
Jarque-Bera JBTest statistic based on skewness and excess kurtosis. Sensitive to departures in the tails.
Jarque-Bera p-valueP-value for the Jarque-Bera test.

Interpretation

  • If the p-value from any test is less than alpha, normality is rejected. However, consider the consensus across all five tests rather than relying on a single test.
  • Shapiro-Wilk is generally the most powerful test for small to moderate samples (N < 2000) and is often considered the primary reference.
  • Anderson-Darling is particularly sensitive to departures in the tails, making it useful when tail behaviour matters (e.g., for extreme value analysis).
  • Jarque-Bera focuses specifically on skewness and kurtosis. A significant result pinpoints whether non-normality comes from asymmetry, heavy tails, or both.
  • Always complement formal tests with a Q-Q (quantile-quantile) plot. Visual assessment provides context that p-values alone cannot.

Common Pitfalls

  • With large samples (N > 500), normality tests become overly sensitive and reject normality for trivial departures that do not materially affect parametric test validity. Use Q-Q plots to judge practical significance.
  • With very small samples (N < 20), normality tests have low power and may fail to detect meaningful departures from normality.
  • A non-significant result does not prove normality. It only means you do not have sufficient evidence to reject it.
  • Testing raw data for normality is often the wrong question. For t-tests and ANOVA, it is the residuals (or within-group distributions) that should be normal, not the overall data.

How It Works

  1. Shapiro-Wilk: Computes the ratio of the best linear estimate of variance to the usual variance estimate. A ratio close to 1 indicates normality.
  2. Lilliefors: Measures the maximum vertical distance between the empirical CDF and a normal CDF with the same mean and standard deviation.
  3. Anderson-Darling: Integrates the squared difference between the empirical and theoretical CDFs, with extra weight given to the tails.
  4. Jarque-Bera: Tests whether the sample skewness and excess kurtosis are jointly zero, which they would be for a normal distribution.

Citations

References

  • Shapiro, S. S., & Wilk, M. B. (1965). An analysis of variance test for normality (complete samples). Biometrika, 52(3-4), 591-611.
  • Lilliefors, H. W. (1967). On the Kolmogorov-Smirnov test for normality with mean and variance unknown. Journal of the American Statistical Association, 62(318), 399-402.
  • Anderson, T. W., & Darling, D. A. (1952). Asymptotic theory of certain "goodness of fit" criteria based on stochastic processes. Annals of Mathematical Statistics, 23(2), 193-212.
  • Cramér, H. (1928). On the composition of elementary errors. Scandinavian Actuarial Journal, 1928(1), 13-74.
  • Jarque, C. M., & Bera, A. K. (1980). Efficient tests for normality, homoscedasticity and serial independence of regression residuals. Economics Letters, 6(3), 255-259.