Measures of dispersion

Measures of dispersion

Table of Contents

Introduction

Measures of dispersion quantify how much data values vary or spread out. Understanding dispersion helps you grasp the reliability, consistency, and variability of data.


1. Range

๐Ÿ“Œ Definition:

The difference between the maximum and minimum values in a dataset.

\[ \text{Range} = \text{Max} - \text{Min} \]

โœ… Best Practices:

  • Use for quick insights into data spread.
  • Best for small datasets or when outliers are not present.

๐Ÿง  Interpretation:

  • A large range suggests high variability.
  • A small range indicates data values are close together.

โš ๏ธ Limitations:

  • Highly sensitive to outliers.
  • Doesn’t reflect the distribution between the extremes.

Example:
Dataset: [10, 12, 15, 22, 100]
Range = 100 - 10 = 90


2. Variance

๐Ÿ“Œ Definition:

The average of squared deviations from the mean.

\[ \text{Variance} = \frac{1}{n} \sum_{i=1}^{n}(x_i - \bar{x})^2 \]

โœ… Best Practices:

  • Use when comparing variability across multiple datasets.
  • Ideal for understanding total dispersion mathematically.

๐Ÿง  Interpretation:

  • Higher variance = more spread out data.
  • Lower variance = values are clustered around the mean.

โš ๏ธ Limitations:

  • Units are squared, making it less interpretable in original units.
  • Sensitive to outliers.

3. Standard Deviation (SD)

๐Ÿ“Œ Definition:

The square root of variance, making it directly comparable to the original data units.

\[ \sigma = \sqrt{\text{Variance}} \]

โœ… Best Practices:

  • Preferred over variance for interpretation.
  • Use when you need insights into average deviation from the mean.

๐Ÿง  Interpretation:

  • Small SD: data is tightly packed.
  • Large SD: values are more dispersed.
  • In normal distributions: ~68% of values lie within ยฑ1 SD of the mean.

โš ๏ธ Limitations:

  • Assumes symmetry in distribution.
  • Sensitive to outliers.

4. Interquartile Range (IQR)

๐Ÿ“Œ Definition:

The range between the first (Q1) and third quartiles (Q3).

\[ \text{IQR} = Q3 - Q1 \]

โœ… Best Practices:

  • Use for skewed data or datasets with outliers.
  • Excellent for comparing central spread.

๐Ÿง  Interpretation:

  • Small IQR = concentrated data.
  • Large IQR = variable middle 50%.

โš ๏ธ Limitations:

  • Ignores data in the tails.

5. Mean Absolute Deviation (MAD)

๐Ÿ“Œ Definition:

The average of the absolute differences from the mean.

\[ \text{MAD} = \frac{1}{n} \sum_{i=1}^{n} |x_i - \bar{x}| \]

โœ… Best Practices:

  • Useful for a robust, intuitive view of variability.
  • Less affected by outliers than variance or SD.

๐Ÿง  Interpretation:

  • Smaller MAD โ†’ more consistency.
  • Larger MAD โ†’ more variation.

๐Ÿ” Comparison Table

MeasureSensitive to OutliersEasy to InterpretUses All Data PointsRobust for Skewed Data
Rangeโœ… Yesโœ… YesโŒ NoโŒ No
Varianceโœ… YesโŒ Noโœ… YesโŒ No
Standard Dev.โœ… Yesโœ… Yesโœ… YesโŒ No
IQRโŒ Noโœ… YesโŒ Noโœ… Yes
MADโš ๏ธ Moderatelyโœ… Yesโœ… Yesโœ… Yes

๐Ÿงช Real-World Use Cases

IndustryUse Case
FinanceStandard deviation of asset returns (volatility)
HealthcareVariability in patient recovery times
ManufacturingIQR and Range in quality control
EducationMAD to evaluate student performance spread
MarketingAnalyzing customer spending variability

๐Ÿง  Summary

  • Use SD or Variance for complete datasets.
  • Use IQR or MAD for skewed or outlier-prone data.
  • Always visualize for better interpretation (boxplots, histograms).
Share :