Hypothesis Testing Logic and Decision Methods
Scope Label
Core 9758. This branch develops the reasoning language behind one-tailed and two-tailed -tests for a single population mean.
Use it with the hub Hypothesis Testing and the procedural branch Z-Tests for a Population Mean.
A Test Measures Evidence Under
A hypothesis test starts by temporarily assuming the null hypothesis is true.
It then asks:
Under that assumption, how unusual is the sample result?
This is the reason the sampling distribution is built under . The test is not comparing two equally trusted claims. It is asking whether the sample gives enough evidence to reject the baseline claim.
Caption: A hypothesis test starts by assuming , then measures how unusual the sample is under that assumption before making a decision.
Statistical Hypotheses
A statistical hypothesis is a claim about a population parameter.
For a population mean, the null hypothesis is usually
The equality is essential because it fixes the reference distribution used for the test statistic.
The alternative hypothesis may be
Choose from the wording and purpose of the investigation, not from the sample mean after it is observed.
Caption: The null hypothesis fixes the reference value, while the alternative hypothesis states the kind of departure being tested.
Translating Wording into
Many testing errors happen before calculation begins.
| Wording | Meaning | Alternative |
|---|---|---|
| has increased | definite increase | |
| is greater than | definite increase | |
| has decreased | definite decrease | |
| is less than | definite decrease | |
| has changed | either direction | |
| is different | either direction | |
| is affected | either direction | |
| has overstated the mean | true mean is lower | |
| has understated the mean | true mean is higher |
For example, “the machine is not correctly calibrated” usually means the mean may be too high or too low, so the test is two-tailed.
Tail Direction
The form of determines where the rejection region lies.
Lower-tailed test
If
then unusually small test-statistic values support .
Upper-tailed test
If
then unusually large test-statistic values support .
Two-tailed test
If
then unusually extreme values in either direction support .
Caption: The alternative hypothesis decides which tail or tails contain the rejection region.
Significance Level
The significance level is the chosen threshold for rejecting .
It controls how extreme a result must be before it counts as sufficient evidence against .
Typical levels are:
A smaller significance level demands stronger evidence before rejecting .
There is a more precise interpretation: is the probability of rejecting when is actually true. In words, it is the allowed risk of a false rejection.
So a significance level means:
This does not mean there is a probability that is true. The probability is calculated under the assumption that is true.
Critical-Region Method
The critical region is the set of test-statistic values that lead to rejection of .
The critical-region method is:
- decide the test direction from ;
- use to find the critical value or values;
- compute the observed test statistic;
- reject if the statistic lies in the critical region.
For a -test:
- an upper-tailed test uses the right tail;
- a lower-tailed test uses the left tail;
- a two-tailed test uses in each tail.
p-Value Method
The p-value is the probability, assuming is true, of obtaining a result at least as extreme as the observed result.
Decision rule:
A small p-value means the observed result would be unusual if were true. It does not mean that the p-value is the probability that is true.
Caption: The critical-region and p-value methods express the same testing logic in two different ways.
Critical Region Versus p-Value
The two methods are equivalent when used correctly.
| Method | Question asked | Decision |
|---|---|---|
| Critical region | Did the statistic enter the rejection region? | Reject if yes |
| p-value | Is the observed result more extreme than allows? | Reject if p-value |
Use whichever method the question asks for. If no method is specified, either is acceptable if the working is clear.
Mini-Examples: Wording Before Calculation
Overstated mean
Suppose a school claims that the mean score is at least , and the question asks whether the school has overstated the mean.
The suspected direction is lower than the claim, so write
Correctly calibrated
Suppose a machine is claimed to fill bottles with mean volume ml, and the question asks whether the machine is correctly calibrated.
Too low and too high both matter, so write
Do not change this to an upper-tailed test just because the observed sample mean happens to be above .
Writing Conclusions
A testing conclusion must be cautious and contextual.
If is rejected:
There is sufficient evidence at the level that …
If is not rejected:
There is insufficient evidence at the level that …
Avoid saying:
- “ is true”;
- “ is accepted”;
- “the alternative has been proved”.
A hypothesis test makes a decision at a chosen level of evidence. It does not deliver absolute proof.
Common Pitfalls
- Choosing the tail direction from the observed sample mean instead of the question wording.
- Forgetting that must contain equality.
- Splitting across two tails in a one-tailed test.
- Failing to split across two tails in a two-tailed test.
- Treating the p-value as the probability that is true.
- Writing a conclusion without context.
Revision Checklist
- Can you explain why the test statistic is considered under ?
- Can you translate wording into ?
- Can you decide whether a test is lower-tailed, upper-tailed, or two-tailed?
- Can you use both critical-region and p-value methods?
- Can you write a proper conclusion without overclaiming?