Skip to content

Outlier Detection

Distribution & Descriptive

Identifies unusual observations in a dataset using the same methods shown in the app: IQR fences, Z-scores, Modified Z-scores (MAD), and a Grubbs single-outlier check.

When to Use

Use this analysis before running inferential tests to identify observations that may distort your results. For example, checking for data entry errors, identifying patients with unusual responses, or screening for measurement instrument malfunction.

Assumptions

  • The variable is numeric.
  • For Z-score method: the data are approximately normally distributed (otherwise Z-scores are not well-calibrated).
  • For IQR method: no specific distributional assumption is required.

Required Inputs

InputTypeNotes
ValuesNumericData to analyse for outliers
ParameterDefaultOptions
Significance Level0.05Alpha level used for Grubbs testing and shared result formatting.

Output Metrics

MetricWhat it means
Q1 (25th Percentile)First quartile used to compute IQR-based bounds.
Q3 (75th Percentile)Third quartile used to compute IQR-based bounds.
IQR (1.5xIQR)IQR method row showing the lower/upper bounds, outlier count, flagged rows, and outlier values.
Z-ScoreZ-score method row showing the threshold, outlier count, flagged rows, and outlier values.
Modified Z (MAD)Median-absolute-deviation method row showing the threshold, outlier count, flagged rows, and outlier values.
Statistic (G)Grubbs test statistic for the most extreme candidate outlier.
Critical ValueCritical cutoff for the Grubbs statistic at the selected alpha.
Pr > |G|P-value for the Grubbs single-outlier test.

Interpretation

  • An observation being flagged as an outlier does not automatically mean it should be removed. Outliers may be genuine extreme values, data entry errors, or subjects from a different population.
  • The IQR method is robust because it does not assume normality. It is based on the middle 50% of the data and ignores the tails.
  • The Z-score method flags values more than 3 standard deviations from the mean. This works well for approximately normal data but poorly for skewed distributions.
  • The Modified Z-score method uses the median and MAD, making it more robust than the classical Z-score when skewness or extreme values are present.
  • Grubbs is a single-outlier test: it is most informative when you want to check whether the single most extreme observation is unusually far from the rest.
  • If multiple methods flag the same row, there is stronger evidence that the observation is genuinely unusual.

Common Pitfalls

  • Automatically removing outliers without investigation is bad practice. Always investigate the reason for each outlier before deciding what to do.
  • The Z-score method can mask outliers in small samples because outliers inflate the standard deviation, making their Z-scores appear less extreme (masking effect).
  • With non-normal data, the Z-score method will flag too many or too few observations. Use the IQR method for skewed distributions.
  • Grubbs assumes approximate normality and targets a single suspected outlier, so it is not a general-purpose replacement for the other methods.
  • Removing outliers can bias your results if the outliers are genuine observations. Consider robust statistical methods as an alternative.

How It Works

  1. IQR method: Calculate Q1, Q3, and IQR = Q3 - Q1. Flag any value below Q1 - 1.5*IQR or above Q3 + 1.5*IQR.
  2. Z-score method: Standardise each value as Z = (value - mean) / SD. Flag any value with |Z| exceeding the threshold (default 3).
  3. Modified Z-score method: Standardise deviations from the median using MAD and flag values with |Mz| above the threshold (default 3.5).
  4. Grubbs method: Test the single most extreme observation against the rest of the sample using the Grubbs statistic at the selected alpha level.
  5. Report the flagged values, their row positions, and the counts for each method.

Citations

References

  • Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley.
  • Grubbs, F. E. (1950). Sample criteria for testing outlying observations. Annals of Mathematical Statistics, 21(1), 27-58.