Estimators and Unbiased Estimates

Scope Label

Core 9758. This branch develops the estimation side of Sampling and Estimation: what an estimator is, what unbiasedness means, and how to compute unbiased estimates of the population mean and variance.

Use it with the hub Sampling and Estimation and the sampling-design branch Sampling Methods.

Parameters, Statistics, Estimators, and Estimates

A population parameter is a numerical feature of the whole population. In H2 estimation, the main parameters are:

population mean $μ$ ;
population variance $σ^{2}$ .

These are fixed but usually unknown.

A sample statistic is computed from sample data. Examples are:

sample mean $\overset{ˉ}{X}$ or observed value $\overset{x}{ˉ}$ ;
sample variance $S^{2}$ or observed value $s^{2}$ .

An estimator is a statistic used as a rule for estimating an unknown population parameter. A point estimate is the numerical value obtained from one observed sample.

For example:

$\overset{ˉ}{X}$ is an estimator of $μ$ ;
after observing one sample, $\overset{x}{ˉ} = 12.4$ may be a point estimate of $μ$ .

The distinction matters:

an estimator is random before sampling;
a point estimate is fixed after the sample has been observed.

Caption: Parameters belong to the population; statistics are computed from a sample and used to estimate those parameters.

What Unbiasedness Means

An estimator is unbiased if its expected value equals the parameter it is estimating.

If $T$ estimates a population parameter $θ$ , then $T$ is unbiased if

E (T) = θ .

This is a long-run statement. It does not mean every estimate from every sample is correct.

It means:

one sample may overestimate the parameter;
another sample may underestimate it;
across all possible random samples of the same size, the estimator is centred at the true parameter.

Caption: An unbiased estimator varies from sample to sample, but its long-run centre is the true population parameter.

Bias Versus Variability

Unbiasedness is not the only property of an estimator.

Two estimators may both be unbiased, but one may fluctuate much more from sample to sample. The estimator with smaller variability is usually more useful in practice.

So estimator quality has two separate dimensions:

bias: is the estimator centred correctly?
variability: how widely does it fluctuate across repeated samples?

Caption: Bias concerns where estimates are centred; variability concerns how widely they fluctuate across repeated samples.

Unbiased Estimate of the Population Mean

The unbiased estimate of the population mean $μ$ is the sample mean.

For raw data $x_{1}, x_{2}, \dots, x_{n}$ ,

\overset{x}{ˉ} = \frac{1}{n} \sum x .

For grouped or tabulated data with frequency $f$ ,

\overset{x}{ˉ} = \frac{\sum f x}{\sum f} .

This is conceptually natural:

the population mean is the average of the population;
the sample mean is the average of the observed sample;
under proper random sampling, the sample mean is centred at the population mean.

Unbiased Estimate of the Population Variance

The population variance $σ^{2}$ measures spread in the whole population.

For a sample, the unbiased estimate of $σ^{2}$ uses denominator $n - 1$ :

s^{2} = \frac{1}{n - 1} \sum (x - \overset{x}{ˉ})^{2} .

The computational form is often more convenient:

s^{2} = \frac{\sum x ^{2} - \frac{( \sum x ) ^{2}}{n}}{n - 1} .

For grouped data,

s^{2} = \frac{\sum f x ^{2} - \frac{( \sum f x ) ^{2}}{n}}{n - 1}, n = \sum f .

The $n - 1$ correction is needed because the deviations are measured from $\overset{x}{ˉ}$ , a centre fitted from the same sample. This makes the deviations slightly too small on average if we divide by $n$ .

Caption: Measuring spread from the sample mean makes the sample deviations slightly too small on average; using $n - 1$ corrects this downward bias.

Why $n - 1$ Appears

The sample mean $\overset{x}{ˉ}$ is chosen from the data. Once $\overset{x}{ˉ}$ is fixed, the deviations

x_{i} - \overset{x}{ˉ}

must satisfy

\sum (x_{i} - \overset{x}{ˉ}) = 0.

So the deviations cannot all vary freely. Informally, one degree of freedom has been used to estimate the centre. Only $n - 1$ deviations are free.

That is why

\frac{1}{n} \sum (x - \overset{x}{ˉ})^{2}

tends to underestimate $σ^{2}$ , while

\frac{1}{n - 1} \sum (x - \overset{x}{ˉ})^{2}

is the unbiased estimator used in H2.

Worked Example 1: Raw Data

A random sample gives

6, 8, 9, 10, 12.

Find unbiased estimates of the population mean and variance.

First compute:

n = 5, \sum x = 45, \sum x^{2} = 425.

Then

\overset{x}{ˉ} = \frac{45}{5} = 9.

Also

s^{2} = \frac{425 - \frac{4 5 ^{2}}{5}}{5 - 1} = \frac{425 - 405}{4} = 5.

Therefore,

\overset{x}{ˉ} = 9, s^{2} = 5.

Worked Example 2: Summarised Data

A random sample of size $20$ gives

\sum x = 314, \sum x^{2} = 5128.

Find unbiased estimates of the population mean and variance.

The estimated population mean is

\overset{x}{ˉ} = \frac{314}{20} = 15.7.

The estimated population variance is

s^{2} = \frac{5128 - \frac{31 4 ^{2}}{20}}{20 - 1} .

Since

\frac{31 4 ^{2}}{20} = 4929.8,

we get

s^{2} = \frac{5128 - 4929.8}{19} = \frac{198.2}{19} \approx 10.43.

Thus

\overset{x}{ˉ} = 15.7, s^{2} \approx 10.43.

Worked Example 3: Grouped Data

The following frequency table is obtained from a sample.

$x$	1	2	3	4	5
$f$	2	5	7	4	2

First,

n = \sum f = 20.

Next,

\sum f x = 2 (1) + 5 (2) + 7 (3) + 4 (4) + 2 (5) = 59,

and

\sum f x^{2} = 2 (1^{2}) + 5 (2^{2}) + 7 (3^{2}) + 4 (4^{2}) + 2 (5^{2}) = 199.

Hence

\overset{x}{ˉ} = \frac{59}{20} = 2.95,

and

s^{2} = \frac{199 - \frac{5 9 ^{2}}{20}}{19} = \frac{199 - 174.05}{19} = \frac{24.95}{19} \approx 1.31.

Therefore,

\overset{x}{ˉ} = 2.95, s^{2} \approx 1.31.

Calculator Discipline

Many graphing calculators can compute sample standard deviation directly. In H2 work, still check what the calculator output means:

population standard deviation usually divides by $n$ ;
sample standard deviation usually divides by $n - 1$ .

When estimating $σ^{2}$ from a sample, use the sample-variance convention with $n - 1$ .

Common Pitfalls

Treating “unbiased” as meaning “always exactly correct”.
Forgetting that an estimator is a rule while a point estimate is one observed value.
Using the $n$ denominator when the question asks for an unbiased estimate of population variance.
Confusing sample standard deviation with population standard deviation on a calculator.
Treating raw-data, summarised-data, and grouped-data formulas as unrelated methods.

Revision Checklist

Can you distinguish a parameter, statistic, estimator, and point estimate?
Can you explain unbiasedness as a repeated-sampling idea?
Can you compute $\overset{x}{ˉ}$ from raw and grouped data?
Can you compute $s^{2}$ from raw, summarised, and grouped data?
Can you explain why $n - 1$ appears in the variance estimate?
Can you identify whether a calculator output divides by $n$ or by $n - 1$ ?

Singapore H2 Math Wiki

Start Here

Estimators and Unbiased Estimates

Estimators and Unbiased Estimates

Scope Label

Parameters, Statistics, Estimators, and Estimates

What Unbiasedness Means

Bias Versus Variability

Unbiased Estimate of the Population Mean

Unbiased Estimate of the Population Variance

Why $n - 1$ Appears

Worked Example 1: Raw Data

Worked Example 2: Summarised Data

Worked Example 3: Grouped Data

Calculator Discipline

Common Pitfalls

Revision Checklist

Graph View

Table of Contents

Backlinks

Singapore H2 Math Wiki

Start Here

Estimators and Unbiased Estimates

Estimators and Unbiased Estimates

Scope Label

Parameters, Statistics, Estimators, and Estimates

What Unbiasedness Means

Bias Versus Variability

Unbiased Estimate of the Population Mean

Unbiased Estimate of the Population Variance

Why n−1 Appears

Worked Example 1: Raw Data

Worked Example 2: Summarised Data

Worked Example 3: Grouped Data

Calculator Discipline

Common Pitfalls

Revision Checklist

Graph View

Table of Contents

Backlinks

Why $n - 1$ Appears