Estimators and Unbiased Estimates

Scope Label

Core 9758. This branch develops the estimation side of Sampling and Estimation: what an estimator is, what unbiasedness means, and how to compute unbiased estimates of the population mean and variance.

Use it with the hub Sampling and Estimation and the sampling-design branch Sampling Methods.

Parameters, Statistics, Estimators, and Estimates

A population parameter is a numerical feature of the whole population. In H2 estimation, the main parameters are:

  • population mean ;
  • population variance .

These are fixed but usually unknown.

A sample statistic is computed from sample data. Examples are:

  • sample mean or observed value ;
  • sample variance or observed value .

An estimator is a statistic used as a rule for estimating an unknown population parameter. A point estimate is the numerical value obtained from one observed sample.

For example:

  • is an estimator of ;
  • after observing one sample, may be a point estimate of .

The distinction matters:

  • an estimator is random before sampling;
  • a point estimate is fixed after the sample has been observed.

Caption: Parameters belong to the population; statistics are computed from a sample and used to estimate those parameters.

What Unbiasedness Means

An estimator is unbiased if its expected value equals the parameter it is estimating.

If estimates a population parameter , then is unbiased if

This is a long-run statement. It does not mean every estimate from every sample is correct.

It means:

  • one sample may overestimate the parameter;
  • another sample may underestimate it;
  • across all possible random samples of the same size, the estimator is centred at the true parameter.

Caption: An unbiased estimator varies from sample to sample, but its long-run centre is the true population parameter.

Bias Versus Variability

Unbiasedness is not the only property of an estimator.

Two estimators may both be unbiased, but one may fluctuate much more from sample to sample. The estimator with smaller variability is usually more useful in practice.

So estimator quality has two separate dimensions:

  • bias: is the estimator centred correctly?
  • variability: how widely does it fluctuate across repeated samples?

Caption: Bias concerns where estimates are centred; variability concerns how widely they fluctuate across repeated samples.

Unbiased Estimate of the Population Mean

The unbiased estimate of the population mean is the sample mean.

For raw data ,

For grouped or tabulated data with frequency ,

This is conceptually natural:

  • the population mean is the average of the population;
  • the sample mean is the average of the observed sample;
  • under proper random sampling, the sample mean is centred at the population mean.

Unbiased Estimate of the Population Variance

The population variance measures spread in the whole population.

For a sample, the unbiased estimate of uses denominator :

The computational form is often more convenient:

For grouped data,

The correction is needed because the deviations are measured from , a centre fitted from the same sample. This makes the deviations slightly too small on average if we divide by .

Caption: Measuring spread from the sample mean makes the sample deviations slightly too small on average; using corrects this downward bias.

Why Appears

The sample mean is chosen from the data. Once is fixed, the deviations

must satisfy

So the deviations cannot all vary freely. Informally, one degree of freedom has been used to estimate the centre. Only deviations are free.

That is why

tends to underestimate , while

is the unbiased estimator used in H2.

Worked Example 1: Raw Data

A random sample gives

Find unbiased estimates of the population mean and variance.

First compute:

Then

Also

Therefore,

Worked Example 2: Summarised Data

A random sample of size gives

Find unbiased estimates of the population mean and variance.

The estimated population mean is

The estimated population variance is

Since

we get

Thus

Worked Example 3: Grouped Data

The following frequency table is obtained from a sample.

12345
25742

First,

Next,

and

Hence

and

Therefore,

Calculator Discipline

Many graphing calculators can compute sample standard deviation directly. In H2 work, still check what the calculator output means:

  • population standard deviation usually divides by ;
  • sample standard deviation usually divides by .

When estimating from a sample, use the sample-variance convention with .

Common Pitfalls

  • Treating “unbiased” as meaning “always exactly correct”.
  • Forgetting that an estimator is a rule while a point estimate is one observed value.
  • Using the denominator when the question asks for an unbiased estimate of population variance.
  • Confusing sample standard deviation with population standard deviation on a calculator.
  • Treating raw-data, summarised-data, and grouped-data formulas as unrelated methods.

Revision Checklist

  • Can you distinguish a parameter, statistic, estimator, and point estimate?
  • Can you explain unbiasedness as a repeated-sampling idea?
  • Can you compute from raw and grouped data?
  • Can you compute from raw, summarised, and grouped data?
  • Can you explain why appears in the variance estimate?
  • Can you identify whether a calculator output divides by or by ?