Sampling Distribution and the Central Limit Theorem
Sampling Distribution of the Sample Mean and the CLT
🎯 Background: Sample Mean as a Random Variable
Because different samples can yield different sample means, the sample mean is a random variable with a distribution.
The key question: What is the distribution of the sample mean \(\bar{X}\)?
The answer depends on two things:
- Is the original population normally distributed?
- What is the sample size n?
📊 Population Parameters
Quantities describing a distribution or population are called parameters:
\(\mu\)
Population mean
(also called the expected value)
\(\sigma^2\)
Population variance
\(\sigma\)
Population standard deviation
\(\sigma = \sqrt{\sigma^2}\)
⭐ Properties of the Sampling Distribution of the Mean
Property 1: Expected value of the sample mean
\(E(\bar{X}) = \mu_{\bar{X}} = \mu\)
The mean of all possible sample means equals the population mean
Property 2: Variance of the sample mean
\(V(\bar{X}) = \sigma_{\bar{X}}^2 = \frac{\sigma^2}{n}\)
The variance of sample means equals the population variance divided by n
(this property holds only for a random sample)
Property 3: Standard Error
\(\sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}}\)
The standard deviation of the sample mean is called the "standard error"
💡 Key insight: There is an inverse relationship between sample size and variance of sample means.
Larger sample → smaller variance → means more concentrated around μ
📈 Effect of Sample Size on Variance
Conclusion: As sample size increases, the sampling distribution of the mean becomes:
- more narrow (smaller variance)
- more concentrated around the population mean μ
🔔 Case 1: Sampling from a Normal Distribution
If: we sample from a population where the variable is normally distributed with mean μ and variance σ²
Then: the sample mean is also normally distributed!
\(\bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right)\)
💡 Note: In this case the sample mean is normally distributed for any sample size n, even if n is small!
🌟 Central Limit Theorem (CLT)
Theorem:
If a population has any distribution (not necessarily normal!) with mean μ and variance σ²,
then for a sufficiently large sample, the sample mean is approximately normally distributed:
\(\bar{X} \xrightarrow{n \to \infty} N\left(\mu, \frac{\sigma^2}{n}\right)\)
🎯 This is one of the most important theorems in statistics!
❓ When Is the Sample "Large Enough"?
Rule of thumb: usually sufficient \(n \geq 30\)
But it depends on the original distribution:
- Symmetric distribution: even relatively small n (15–20) may suffice
- Asymmetric distribution: need larger n (30+)
- Highly asymmetric distribution: need very large n (50+)
📊 Illustration of the Central Limit Theorem
🧮 Computing the Z-Score for a Sample Mean
\(Z = \frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}}} = \frac{\bar{X} - \mu}{\sigma_{\bar{X}}}\)
💡 Note the difference:
- For a single observation X: \(Z = \frac{X - \mu}{\sigma}\)
- For a sample mean \(\bar{X}\): \(Z = \frac{\bar{X} - \mu}{\sigma / \sqrt{n}}\)
✏️ Detailed Example
Question: Birth weight is distributed with μ = 3.2 kg and σ = 0.5 kg.
A sample of 36 babies is taken. What is the probability the sample mean exceeds 3.35 kg?
Step 1: Identify the data
\(\mu = 3.2, \quad \sigma = 0.5, \quad n = 36\)
Step 2: Compute the standard error
\(\sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}} = \frac{0.5}{\sqrt{36}} = \frac{0.5}{6} = 0.0833\)
Step 3: Compute the Z-score
\(Z = \frac{\bar{X} - \mu}{\sigma_{\bar{X}}} = \frac{3.35 - 3.2}{0.0833} = \frac{0.15}{0.0833} = 1.8\)
Step 4: Compute the probability
\(P(\bar{X} > 3.35) = P(Z > 1.8) = 1 - P(Z \leq 1.8) = 1 - 0.9641 = 0.0359\)
Answer: the probability is approx. 3.59%
📋 Summary Table
| Case | Distribution of sample mean |
|---|---|
| Population is normal | \(\bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right)\) exactly, for any n |
| Population not normal, n large (≥30) | \(\bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right)\) approximately (CLT) |
| Population not normal, n small | Normal approximation cannot be used |
📝 Key Formulas
\(E(\bar{X}) = \mu\)
\(V(\bar{X}) = \frac{\sigma^2}{n}\)
\(\sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}}\) (standard error)
\(Z = \frac{\bar{X} - \mu}{\sigma / \sqrt{n}}\)