Scatter, Correlation, and PMCC

Scope Label

Core 9758. Scatter diagrams and product moment correlation coefficient are core tools for describing bivariate data.

Role in the Topic

This branch explains the first half of correlation and regression: before fitting a regression line, you must decide what the paired data show.

Use it with the hub:

Correlation and Linear Regression

Bivariate Data

Bivariate data consists of paired observations:

(x_{1}, y_{1}), (x_{2}, y_{2}), \dots, (x_{n}, y_{n}) .

Each pair belongs to one case. If the pairings are broken, the relationship is broken.

For example, if students’ Mathematics and Physics scores are recorded, $(72, 76)$ means one student scored $72$ in Mathematics and $76$ in Physics. The relationship is not captured by two separate unpaired lists.

Scatter Diagrams

A scatter diagram plots each pair $(x_{i}, y_{i})$ as a point. It should be inspected before computing or interpreting $r$ .

Ask five questions:

Direction: Does $y$ tend to increase or decrease as $x$ increases?
Strength: Are the points close to a line, or widely scattered?
Shape: Is the pattern roughly linear, curved, or irregular?
Outliers: Are there unusual points that may distort the model?
Clusters: Are there subgroups that should not be treated as one simple relationship?

Caption: Scatter diagrams can show positive or negative linear trends, no clear linear trend, or a non-linear pattern that regression should not ignore.

Independent and Dependent Variables

In some contexts, one variable is naturally the input and the other is the response.

For example:

advertising time ⟶ sales .

Here advertising time may be treated as independent and sales as dependent.

But not every association has a clear dependence direction. Mathematics score and English score may be associated without either score directly causing the other.

This distinction matters later because regression lines are directional.

Product Moment Correlation Coefficient

The product moment correlation coefficient $r$ measures the direction and strength of a linear relationship.

- 1 \leq r \leq 1

When full data are given, $r$ may be calculated from the summary form

r = \frac{\sum x y - \frac{\sum x \sum y}{n}}{( \sum x ^{2} - \frac{( \sum x ) ^{2}}{n} ) ( \sum y ^{2} - \frac{( \sum y ) ^{2}}{n} )} .

In practice, the graphing calculator often computes $r$ directly, but the formula shows the structure: $r$ compares how $x$ and $y$ vary together with how much each variable varies on its own.

The sign gives direction:

$r > 0$ : positive linear association
$r < 0$ : negative linear association

The magnitude gives strength:

$∣ r ∣$ close to $1$ : strong linear association
$∣ r ∣$ close to $0$ : weak or no linear association

Caption: The sign of $r$ gives direction, while $∣ r ∣$ reflects the strength of a linear relationship.

Calculating $r$ from Summarised Data

Sometimes a question gives summary values instead of the full dataset.

For example, suppose $n = 7$ and

\sum x = 21, \sum y = 21, \sum x^{2} = 71, \sum y^{2} = 68.5, \sum x y = 69.

Then

r = \frac{69 - \frac{21 ( 21 )}{7}}{( 71 - \frac{2 1 ^{2}}{7} ) ( 68.5 - \frac{2 1 ^{2}}{7} )} .

r \approx 0.905.

This indicates a strong positive linear relationship, provided the scatter diagram does not reveal a serious structural problem such as curvature or an influential outlier.

Further Properties of $r$

The coefficient $r$ is dimensionless. It has no units, even when $x$ and $y$ have units.

The value of $r$ is unchanged by linear changes of scale or origin, such as converting temperatures from Celsius to Fahrenheit. This is because such transformations preserve linear strength.

However, $r$ can change if:

new data pairs are added
an outlier is removed
a different subset of the data is used

So $r$ belongs to the particular dataset being analysed. It is not a permanent property of the real-world variables.

What $r$ Does Not Say

The coefficient $r$ is not a measure of every possible kind of relationship.

If $r \approx 0$ , the correct interpretation is:

There is little evidence of a linear relationship.

It is not:

There is no relationship.

A curved pattern may have $r$ close to $0$ even when the variables are strongly related.

Correlation Is Not Causation

A strong correlation does not prove that one variable causes the other.

Possible explanations include:

one variable may influence the other
a third hidden variable may influence both
both variables may change over time for unrelated reasons
the association may be coincidental

So an exam interpretation should not say “causes” unless the context provides causal evidence.

Why the Scatter Diagram Still Matters

Different datasets can have similar values of $r$ but very different structures.

Caption: Datasets with similar correlation coefficients can have very different scatter-plot structures.

This is why the order should be:

inspect the scatter diagram
decide whether a linear model is sensible
interpret $r$ in light of the diagram

not the other way around.

Core Example

Suppose a scatter diagram is roughly linear and the calculator gives

r = - 0.91.

A good interpretation is:

There is a strong negative linear relationship between the two variables for the observed data.

A poor interpretation is:

One variable causes the other to decrease.

The first sentence describes association. The second claims causation without evidence.

Common Pitfalls

Saying “no relationship” when $r$ is close to $0$ .
Ignoring a curved scatter diagram because the calculator gives $r$ .
Forgetting that outliers can heavily affect $r$ .
Treating correlation as causation.
Describing strength without mentioning linearity.
Using $r$ before checking the scatter diagram.
Forgetting that $r$ has no units.
Treating $r$ as fixed even after changing the dataset.

Revision Checklist

Can you explain why bivariate data must preserve pairings?
Can you read direction, strength, shape, outliers, and clusters from a scatter diagram?
Can you interpret the sign and magnitude of $r$ ?
Can you explain why $r$ measures linear association only?
Can you calculate $r$ from summarised data if required?
Can you explain why $r$ is dimensionless?
Can you distinguish association from causation in words?

Singapore H2 Math Wiki

Start Here

Scatter, Correlation, and PMCC

Scatter, Correlation, and PMCC

Scope Label

Role in the Topic

Bivariate Data

Scatter Diagrams

Independent and Dependent Variables

Product Moment Correlation Coefficient

Calculating $r$ from Summarised Data

Further Properties of $r$

What $r$ Does Not Say

Correlation Is Not Causation

Why the Scatter Diagram Still Matters

Core Example

Common Pitfalls

Revision Checklist

Graph View

Table of Contents

Backlinks

Singapore H2 Math Wiki

Start Here

Scatter, Correlation, and PMCC

Scatter, Correlation, and PMCC

Scope Label

Role in the Topic

Bivariate Data

Scatter Diagrams

Independent and Dependent Variables

Product Moment Correlation Coefficient

Calculating r from Summarised Data

Further Properties of r

What r Does Not Say

Correlation Is Not Causation

Why the Scatter Diagram Still Matters

Core Example

Common Pitfalls

Revision Checklist

Graph View

Table of Contents

Backlinks

Calculating $r$ from Summarised Data

Further Properties of $r$

What $r$ Does Not Say