Normality Tests

Distribution & Descriptive

Runs five complementary normality tests simultaneously (Shapiro-Wilk, Lilliefors, Anderson-Darling, Cramer-von Mises, and Jarque-Bera) to assess whether data follow a normal distribution.

When to Use

Use these tests before applying parametric methods that assume normality (t-tests, ANOVA, Pearson correlation, linear regression). For example, checking whether residuals from a regression model are normally distributed, or whether continuous measurements from a clinical trial can be analysed with a t-test.

Assumptions

The data are a random sample from a population.
Observations are independent.
The variable is continuous.

Required Inputs

Input	Type	Notes
Values	Numeric	Continuous data to test for normality

Output Metrics

Metric	What it means
Shapiro-Wilk W	Test statistic. Values close to 1 indicate normality. Most powerful for small samples (N < 50).
Shapiro-Wilk p-value	P-value for the Shapiro-Wilk test.
Kolmogorov-Smirnov D	Maximum absolute difference between empirical and theoretical normal CDFs. In easyCris this test uses the Lilliefors correction for estimated parameters.
Kolmogorov-Smirnov p-value	P-value for the Kolmogorov-Smirnov test as displayed in the UI; the backend method uses the Lilliefors correction.
Anderson-Darling A-squared	Weighted measure of departure from normality, emphasising tail behaviour.
Anderson-Darling p-value	P-value for the Anderson-Darling test.
Cramer-von Mises W-squared	Integral of the squared difference between empirical and theoretical CDFs.
Cramer-von Mises p-value	P-value for the Cramer-von Mises test.
Jarque-Bera JB	Test statistic based on skewness and excess kurtosis. Sensitive to departures in the tails.
Jarque-Bera p-value	P-value for the Jarque-Bera test.

Interpretation

If the p-value from any test is less than alpha, normality is rejected. However, consider the consensus across all five tests rather than relying on a single test.
Shapiro-Wilk is generally the most powerful test for small to moderate samples (N < 2000) and is often considered the primary reference.
Anderson-Darling is particularly sensitive to departures in the tails, making it useful when tail behaviour matters (e.g., for extreme value analysis).
Jarque-Bera focuses specifically on skewness and kurtosis. A significant result pinpoints whether non-normality comes from asymmetry, heavy tails, or both.
Always complement formal tests with a Q-Q (quantile-quantile) plot. Visual assessment provides context that p-values alone cannot.

Common Pitfalls

With large samples (N > 500), normality tests become overly sensitive and reject normality for trivial departures that do not materially affect parametric test validity. Use Q-Q plots to judge practical significance.
With very small samples (N < 20), normality tests have low power and may fail to detect meaningful departures from normality.
A non-significant result does not prove normality. It only means you do not have sufficient evidence to reject it.
Testing raw data for normality is often the wrong question. For t-tests and ANOVA, it is the residuals (or within-group distributions) that should be normal, not the overall data.

How It Works

Shapiro-Wilk: Computes the ratio of the best linear estimate of variance to the usual variance estimate. A ratio close to 1 indicates normality.
Lilliefors: Measures the maximum vertical distance between the empirical CDF and a normal CDF with the same mean and standard deviation.
Anderson-Darling: Integrates the squared difference between the empirical and theoretical CDFs, with extra weight given to the tails.
Jarque-Bera: Tests whether the sample skewness and excess kurtosis are jointly zero, which they would be for a normal distribution.

Citations

References

Shapiro, S. S., & Wilk, M. B. (1965). An analysis of variance test for normality (complete samples). Biometrika, 52(3-4), 591-611.
Lilliefors, H. W. (1967). On the Kolmogorov-Smirnov test for normality with mean and variance unknown. Journal of the American Statistical Association, 62(318), 399-402.
Anderson, T. W., & Darling, D. A. (1952). Asymptotic theory of certain "goodness of fit" criteria based on stochastic processes. Annals of Mathematical Statistics, 23(2), 193-212.
Cramér, H. (1928). On the composition of elementary errors. Scandinavian Actuarial Journal, 1928(1), 13-74.
Jarque, C. M., & Bera, A. K. (1980). Efficient tests for normality, homoscedasticity and serial independence of regression residuals. Economics Letters, 6(3), 255-259.

McNemar's Test

Descriptive Statistics