Hypothesis Testing Logic and Decision Methods

Scope Label

Core 9758. This branch develops the reasoning language behind one-tailed and two-tailed -tests for a single population mean.

Use it with the hub Hypothesis Testing and the procedural branch Z-Tests for a Population Mean.

A Test Measures Evidence Under

A hypothesis test starts by temporarily assuming the null hypothesis is true.

It then asks:

Under that assumption, how unusual is the sample result?

This is the reason the sampling distribution is built under . The test is not comparing two equally trusted claims. It is asking whether the sample gives enough evidence to reject the baseline claim.

Caption: A hypothesis test starts by assuming , then measures how unusual the sample is under that assumption before making a decision.

Statistical Hypotheses

A statistical hypothesis is a claim about a population parameter.

For a population mean, the null hypothesis is usually

The equality is essential because it fixes the reference distribution used for the test statistic.

The alternative hypothesis may be

Choose from the wording and purpose of the investigation, not from the sample mean after it is observed.

Caption: The null hypothesis fixes the reference value, while the alternative hypothesis states the kind of departure being tested.

Translating Wording into

Many testing errors happen before calculation begins.

WordingMeaningAlternative
has increaseddefinite increase
is greater thandefinite increase
has decreaseddefinite decrease
is less thandefinite decrease
has changedeither direction
is differenteither direction
is affectedeither direction
has overstated the meantrue mean is lower
has understated the meantrue mean is higher

For example, “the machine is not correctly calibrated” usually means the mean may be too high or too low, so the test is two-tailed.

Tail Direction

The form of determines where the rejection region lies.

Lower-tailed test

If

then unusually small test-statistic values support .

Upper-tailed test

If

then unusually large test-statistic values support .

Two-tailed test

If

then unusually extreme values in either direction support .

Caption: The alternative hypothesis decides which tail or tails contain the rejection region.

Significance Level

The significance level is the chosen threshold for rejecting .

It controls how extreme a result must be before it counts as sufficient evidence against .

Typical levels are:

A smaller significance level demands stronger evidence before rejecting .

There is a more precise interpretation: is the probability of rejecting when is actually true. In words, it is the allowed risk of a false rejection.

So a significance level means:

This does not mean there is a probability that is true. The probability is calculated under the assumption that is true.

Critical-Region Method

The critical region is the set of test-statistic values that lead to rejection of .

The critical-region method is:

  1. decide the test direction from ;
  2. use to find the critical value or values;
  3. compute the observed test statistic;
  4. reject if the statistic lies in the critical region.

For a -test:

  • an upper-tailed test uses the right tail;
  • a lower-tailed test uses the left tail;
  • a two-tailed test uses in each tail.

p-Value Method

The p-value is the probability, assuming is true, of obtaining a result at least as extreme as the observed result.

Decision rule:

A small p-value means the observed result would be unusual if were true. It does not mean that the p-value is the probability that is true.

Caption: The critical-region and p-value methods express the same testing logic in two different ways.

Critical Region Versus p-Value

The two methods are equivalent when used correctly.

MethodQuestion askedDecision
Critical regionDid the statistic enter the rejection region?Reject if yes
p-valueIs the observed result more extreme than allows?Reject if p-value

Use whichever method the question asks for. If no method is specified, either is acceptable if the working is clear.

Mini-Examples: Wording Before Calculation

Overstated mean

Suppose a school claims that the mean score is at least , and the question asks whether the school has overstated the mean.

The suspected direction is lower than the claim, so write

Correctly calibrated

Suppose a machine is claimed to fill bottles with mean volume ml, and the question asks whether the machine is correctly calibrated.

Too low and too high both matter, so write

Do not change this to an upper-tailed test just because the observed sample mean happens to be above .

Writing Conclusions

A testing conclusion must be cautious and contextual.

If is rejected:

There is sufficient evidence at the level that …

If is not rejected:

There is insufficient evidence at the level that …

Avoid saying:

  • is true”;
  • is accepted”;
  • “the alternative has been proved”.

A hypothesis test makes a decision at a chosen level of evidence. It does not deliver absolute proof.

Common Pitfalls

  • Choosing the tail direction from the observed sample mean instead of the question wording.
  • Forgetting that must contain equality.
  • Splitting across two tails in a one-tailed test.
  • Failing to split across two tails in a two-tailed test.
  • Treating the p-value as the probability that is true.
  • Writing a conclusion without context.

Revision Checklist

  • Can you explain why the test statistic is considered under ?
  • Can you translate wording into ?
  • Can you decide whether a test is lower-tailed, upper-tailed, or two-tailed?
  • Can you use both critical-region and p-value methods?
  • Can you write a proper conclusion without overclaiming?