Descriptive Statistics

Distribution & Descriptive

Calculates the core summary statistics shown in the app for a numeric variable, including central tendency, spread, shape, range, and confidence limits.

When to Use

Use this as the first step in any analysis to understand the basic properties of your data before applying inferential tests. It answers questions like: What is the typical value? How spread out are the data? Are the data symmetric or skewed?

Assumptions

The variable is numeric (continuous or discrete).
No specific distributional assumptions are required.

Required Inputs

Input	Type	Notes
Values	Numeric	Data to summarise

Parameter	Default	Options
Significance Level	0.05	Controls the alpha level used to compute the confidence interval for the mean.

Output Metrics

Metric	What it means
N	Number of non-missing observations.
Mean	Arithmetic average of the data.
Median	Middle value when data are sorted. More robust to outliers than the mean.
Std Deviation	Standard deviation: average distance of observations from the mean.
Std Error Mean	Standard error of the mean: Std Deviation / sqrt(N). Measures precision of the sample mean as an estimate of the population mean.
Variance	Square of the standard deviation.
Minimum	Smallest value in the data.
Maximum	Largest value in the data.
Range	Difference between the maximum and minimum values.
Q1	First quartile (25th percentile).
Q3	Third quartile (75th percentile).
IQR	Interquartile range (Q3 - Q1). The range of the middle 50% of the data.
Skewness	Measure of asymmetry. 0 = symmetric, positive = right-skewed (long right tail), negative = left-skewed. Bias-corrected.
Kurtosis	Measure of tail heaviness (excess kurtosis). 0 = normal (mesokurtic), positive = heavier tails (leptokurtic), negative = lighter tails (platykurtic). Bias-corrected.
95% CI Lower	Lower bound of the 95% confidence interval for the population mean.
95% CI Upper	Upper bound of the 95% confidence interval for the population mean.

Interpretation

Compare the mean and median: if they are close, the distribution is approximately symmetric. A large difference suggests skewness.
Skewness near 0 indicates symmetry. Values beyond +/-1 indicate substantial skewness that may warrant non-parametric methods.
Kurtosis near 0 (excess kurtosis) indicates normal-like tails. Large positive kurtosis means heavy tails and more extreme values than expected.
The IQR is more robust to outliers than the range or standard deviation. Use it to describe spread when outliers are present.
The 95% CI for the mean tells you where the true population mean likely falls. Narrower intervals indicate more precise estimates.

Common Pitfalls

The mean is sensitive to outliers. A single extreme value can shift the mean substantially. Use the median for skewed data.
Standard deviation is also sensitive to outliers. The IQR is a more robust alternative.
Descriptive statistics alone cannot tell you whether differences are statistically significant. They provide context but not inference.
Beware of Simpson's paradox: aggregate descriptive statistics can mask opposite patterns within subgroups.

How It Works

Compute central tendency measures (mean, median) by averaging or finding the middle value of sorted data.
Compute spread measures (SD, IQR, range) from the deviations around the center.
Compute shape measures (skewness, kurtosis) from standardised third and fourth moments of the distribution.
Construct the 95% confidence interval for the mean using the t-distribution: mean +/- t(alpha/2, N-1) * standard error.

Citations

References

Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley.

Normality Tests

Outlier Detection