Descriptive Statistics
Distribution & DescriptiveCalculates the core summary statistics shown in the app for a numeric variable, including central tendency, spread, shape, range, and confidence limits.
When to Use
Use this as the first step in any analysis to understand the basic properties of your data before applying inferential tests. It answers questions like: What is the typical value? How spread out are the data? Are the data symmetric or skewed?
Assumptions
- The variable is numeric (continuous or discrete).
- No specific distributional assumptions are required.
Required Inputs
| Input | Type | Notes |
|---|---|---|
| Values | Numeric | Data to summarise |
| Parameter | Default | Options |
|---|---|---|
| Significance Level | 0.05 | Controls the alpha level used to compute the confidence interval for the mean. |
Output Metrics
| Metric | What it means |
|---|---|
| N | Number of non-missing observations. |
| Mean | Arithmetic average of the data. |
| Median | Middle value when data are sorted. More robust to outliers than the mean. |
| Std Deviation | Standard deviation: average distance of observations from the mean. |
| Std Error Mean | Standard error of the mean: Std Deviation / sqrt(N). Measures precision of the sample mean as an estimate of the population mean. |
| Variance | Square of the standard deviation. |
| Minimum | Smallest value in the data. |
| Maximum | Largest value in the data. |
| Range | Difference between the maximum and minimum values. |
| Q1 | First quartile (25th percentile). |
| Q3 | Third quartile (75th percentile). |
| IQR | Interquartile range (Q3 - Q1). The range of the middle 50% of the data. |
| Skewness | Measure of asymmetry. 0 = symmetric, positive = right-skewed (long right tail), negative = left-skewed. Bias-corrected. |
| Kurtosis | Measure of tail heaviness (excess kurtosis). 0 = normal (mesokurtic), positive = heavier tails (leptokurtic), negative = lighter tails (platykurtic). Bias-corrected. |
| 95% CI Lower | Lower bound of the 95% confidence interval for the population mean. |
| 95% CI Upper | Upper bound of the 95% confidence interval for the population mean. |
Interpretation
- Compare the mean and median: if they are close, the distribution is approximately symmetric. A large difference suggests skewness.
- Skewness near 0 indicates symmetry. Values beyond +/-1 indicate substantial skewness that may warrant non-parametric methods.
- Kurtosis near 0 (excess kurtosis) indicates normal-like tails. Large positive kurtosis means heavy tails and more extreme values than expected.
- The IQR is more robust to outliers than the range or standard deviation. Use it to describe spread when outliers are present.
- The 95% CI for the mean tells you where the true population mean likely falls. Narrower intervals indicate more precise estimates.
Common Pitfalls
- The mean is sensitive to outliers. A single extreme value can shift the mean substantially. Use the median for skewed data.
- Standard deviation is also sensitive to outliers. The IQR is a more robust alternative.
- Descriptive statistics alone cannot tell you whether differences are statistically significant. They provide context but not inference.
- Beware of Simpson's paradox: aggregate descriptive statistics can mask opposite patterns within subgroups.
How It Works
- Compute central tendency measures (mean, median) by averaging or finding the middle value of sorted data.
- Compute spread measures (SD, IQR, range) from the deviations around the center.
- Compute shape measures (skewness, kurtosis) from standardised third and fourth moments of the distribution.
- Construct the 95% confidence interval for the mean using the t-distribution: mean +/- t(alpha/2, N-1) * standard error.
Citations
References
- Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley.