Skip to content

Descriptive Statistics

Distribution & Descriptive

Calculates the core summary statistics shown in the app for a numeric variable, including central tendency, spread, shape, range, and confidence limits.

When to Use

Use this as the first step in any analysis to understand the basic properties of your data before applying inferential tests. It answers questions like: What is the typical value? How spread out are the data? Are the data symmetric or skewed?

Assumptions

  • The variable is numeric (continuous or discrete).
  • No specific distributional assumptions are required.

Required Inputs

InputTypeNotes
ValuesNumericData to summarise
ParameterDefaultOptions
Significance Level0.05Controls the alpha level used to compute the confidence interval for the mean.

Output Metrics

MetricWhat it means
NNumber of non-missing observations.
MeanArithmetic average of the data.
MedianMiddle value when data are sorted. More robust to outliers than the mean.
Std DeviationStandard deviation: average distance of observations from the mean.
Std Error MeanStandard error of the mean: Std Deviation / sqrt(N). Measures precision of the sample mean as an estimate of the population mean.
VarianceSquare of the standard deviation.
MinimumSmallest value in the data.
MaximumLargest value in the data.
RangeDifference between the maximum and minimum values.
Q1First quartile (25th percentile).
Q3Third quartile (75th percentile).
IQRInterquartile range (Q3 - Q1). The range of the middle 50% of the data.
SkewnessMeasure of asymmetry. 0 = symmetric, positive = right-skewed (long right tail), negative = left-skewed. Bias-corrected.
KurtosisMeasure of tail heaviness (excess kurtosis). 0 = normal (mesokurtic), positive = heavier tails (leptokurtic), negative = lighter tails (platykurtic). Bias-corrected.
95% CI LowerLower bound of the 95% confidence interval for the population mean.
95% CI UpperUpper bound of the 95% confidence interval for the population mean.

Interpretation

  • Compare the mean and median: if they are close, the distribution is approximately symmetric. A large difference suggests skewness.
  • Skewness near 0 indicates symmetry. Values beyond +/-1 indicate substantial skewness that may warrant non-parametric methods.
  • Kurtosis near 0 (excess kurtosis) indicates normal-like tails. Large positive kurtosis means heavy tails and more extreme values than expected.
  • The IQR is more robust to outliers than the range or standard deviation. Use it to describe spread when outliers are present.
  • The 95% CI for the mean tells you where the true population mean likely falls. Narrower intervals indicate more precise estimates.

Common Pitfalls

  • The mean is sensitive to outliers. A single extreme value can shift the mean substantially. Use the median for skewed data.
  • Standard deviation is also sensitive to outliers. The IQR is a more robust alternative.
  • Descriptive statistics alone cannot tell you whether differences are statistically significant. They provide context but not inference.
  • Beware of Simpson's paradox: aggregate descriptive statistics can mask opposite patterns within subgroups.

How It Works

  1. Compute central tendency measures (mean, median) by averaging or finding the middle value of sorted data.
  2. Compute spread measures (SD, IQR, range) from the deviations around the center.
  3. Compute shape measures (skewness, kurtosis) from standardised third and fourth moments of the distribution.
  4. Construct the 95% confidence interval for the mean using the t-distribution: mean +/- t(alpha/2, N-1) * standard error.

Citations

References

  • Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley.