Sampling Distribution – Sample Mean and CLT | G11

Sampling Distribution and the Central Limit Theorem

Sampling Distribution of the Sample Mean and the CLT

🎯 Background: Sample Mean as a Random Variable

Because different samples can yield different sample means, the sample mean is a random variable with a distribution.

The key question: What is the distribution of the sample mean \(\bar{X}\)?

The answer depends on two things:

  • Is the original population normally distributed?
  • What is the sample size n?

📊 Population Parameters

Quantities describing a distribution or population are called parameters:

\(\mu\)

Population mean

(also called the expected value)

\(\sigma^2\)

Population variance

\(\sigma\)

Population standard deviation

\(\sigma = \sqrt{\sigma^2}\)

⭐ Properties of the Sampling Distribution of the Mean

Property 1: Expected value of the sample mean

\(E(\bar{X}) = \mu_{\bar{X}} = \mu\)

The mean of all possible sample means equals the population mean

Property 2: Variance of the sample mean

\(V(\bar{X}) = \sigma_{\bar{X}}^2 = \frac{\sigma^2}{n}\)

The variance of sample means equals the population variance divided by n

(this property holds only for a random sample)

Property 3: Standard Error

\(\sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}}\)

The standard deviation of the sample mean is called the "standard error"

💡 Key insight: There is an inverse relationship between sample size and variance of sample means.

Larger sample → smaller variance → means more concentrated around μ

📈 Effect of Sample Size on Variance

μ n=10 n=30 n=100 Sampling distribution of the mean for different sample sizes

Conclusion: As sample size increases, the sampling distribution of the mean becomes:

  • more narrow (smaller variance)
  • more concentrated around the population mean μ

🔔 Case 1: Sampling from a Normal Distribution

If: we sample from a population where the variable is normally distributed with mean μ and variance σ²

Then: the sample mean is also normally distributed!

\(\bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right)\)

💡 Note: In this case the sample mean is normally distributed for any sample size n, even if n is small!

🌟 Central Limit Theorem (CLT)

Theorem:

If a population has any distribution (not necessarily normal!) with mean μ and variance σ²,

then for a sufficiently large sample, the sample mean is approximately normally distributed:

\(\bar{X} \xrightarrow{n \to \infty} N\left(\mu, \frac{\sigma^2}{n}\right)\)

🎯 This is one of the most important theorems in statistics!

❓ When Is the Sample "Large Enough"?

Rule of thumb: usually sufficient \(n \geq 30\)

But it depends on the original distribution:

  • Symmetric distribution: even relatively small n (15–20) may suffice
  • Asymmetric distribution: need larger n (30+)
  • Highly asymmetric distribution: need very large n (50+)

📊 Illustration of the Central Limit Theorem

Original distribution (asymmetric) Sample large n Distribution of sample mean (approximately normal) μ The message of the Central Limit Theorem: No matter what the original distribution looks like — if we take enough observations (large n), the sample mean will be approximately normally distributed!

🧮 Computing the Z-Score for a Sample Mean

\(Z = \frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}}} = \frac{\bar{X} - \mu}{\sigma_{\bar{X}}}\)

💡 Note the difference:

  • For a single observation X:   \(Z = \frac{X - \mu}{\sigma}\)
  • For a sample mean \(\bar{X}\):   \(Z = \frac{\bar{X} - \mu}{\sigma / \sqrt{n}}\)

✏️ Detailed Example

Question: Birth weight is distributed with μ = 3.2 kg and σ = 0.5 kg.

A sample of 36 babies is taken. What is the probability the sample mean exceeds 3.35 kg?

Step 1: Identify the data

\(\mu = 3.2, \quad \sigma = 0.5, \quad n = 36\)

Step 2: Compute the standard error

\(\sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}} = \frac{0.5}{\sqrt{36}} = \frac{0.5}{6} = 0.0833\)

Step 3: Compute the Z-score

\(Z = \frac{\bar{X} - \mu}{\sigma_{\bar{X}}} = \frac{3.35 - 3.2}{0.0833} = \frac{0.15}{0.0833} = 1.8\)

Step 4: Compute the probability

\(P(\bar{X} > 3.35) = P(Z > 1.8) = 1 - P(Z \leq 1.8) = 1 - 0.9641 = 0.0359\)

Answer: the probability is approx. 3.59%

📋 Summary Table

Case Distribution of sample mean
Population is normal \(\bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right)\) exactly, for any n
Population not normal, n large (≥30) \(\bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right)\) approximately (CLT)
Population not normal, n small Normal approximation cannot be used

📝 Key Formulas

\(E(\bar{X}) = \mu\)

\(V(\bar{X}) = \frac{\sigma^2}{n}\)

\(\sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}}\) (standard error)

\(Z = \frac{\bar{X} - \mu}{\sigma / \sqrt{n}}\)