Data Transformations and Linearisation

Scope Label

Core 9758. Data transformations are explicitly part of the current correlation-and-regression scope.

Role in the Topic

This branch explains what to do when the original scatter diagram is not approximately linear but suggests a structured non-linear relationship.

Use it with:

Why Transform Data?

Correlation and linear regression are designed for linear patterns.

If the scatter diagram is curved, a straight-line regression on the original variables may be misleading. However, some curved relationships can be rewritten as straight-line relationships by transforming one or both variables.

This is called linearisation.

Caption: Linearisation transforms a curved relationship into a new variable pair that can be modelled linearly.

The aim is not to force every dataset into a line. The aim is to test a plausible model form.

The Linearisation Workflow

A reliable workflow is:

Inspect the original scatter diagram.
Identify a plausible non-linear model.
Define the transformed variable or variables.
Draw or inspect the transformed scatter diagram.
Compute $r$ for the transformed variables.
Fit a regression line to the transformed variables.
Interpret the fitted model in the original context.

The model with $∣ r ∣$ closest to $1$ is often better supported, but it should still make contextual sense.

Common Transformation Forms

The goal is to rewrite the model in the form

Y = A + BX

for suitable transformed variables $X$ and $Y$ .

Original model	Transformation	Linear form
$y = a + \frac{b}{x}$	let $X = \frac{1}{x}$	$y = a + b X$
$y = a + b x^{2}$	let $X = x^{2}$	$y = a + b X$
$y = a + b ln x$	let $X = ln x$	$y = a + b X$
$y = a b^{x}$	let $Y = ln y$	$Y = ln a + x ln b$
$y = a x^{b}$	let $X = ln x$ , $Y = ln y$	$Y = ln a + b X$

Logarithmic transformations require positive values where the logarithm is taken. For example, $ln y$ is only defined when $y > 0$ .

Reading Parameters After Linearisation

After transformation, the regression line may not directly use the original parameters.

For example, suppose

y = a b^{x} .

Taking logarithms gives

ln y = ln a + x ln b .

If the transformed regression line is

ln y = A + B x,

then

A = ln a, B = ln b .

a = e^{A}, b = e^{B} .

The intercept and gradient of the transformed line must be translated back carefully.

Choosing Between Models

When several possible transformations are suggested, compare:

the transformed scatter diagrams
the values of $∣ r ∣$ for the transformed data
the reasonableness of the model in context
the domain restrictions introduced by the transformation

Do not choose a model from calculator output alone.

A good exam sentence is:

The plot of $ln y$ against $x$ is approximately linear and has $∣ r ∣$ closest to $1$ , so an exponential model is most suitable among the proposed models.

Core Example: Power Model

Suppose a relationship is suspected to have the form

y = a x^{b}, x > 0, y > 0.

Taking logarithms:

ln y = ln a + b ln x .

So if we plot $ln y$ against $ln x$ and obtain an approximately straight-line pattern, the model is supported.

If the fitted transformed line is

ln y = 1.2 + 0.75 ln x,

then

ln a = 1.2, b = 0.75.

Hence

a = e^{1.2},

and the fitted original model is

y = e^{1.2} x^{0.75} .

Core Example: Reciprocal Model

Suppose the data decrease quickly at first and then level off. A possible model is

y = a + \frac{b}{x} .

Let

X = \frac{1}{x} .

Then

y = a + b X .

So the transformed scatter diagram should plot $y$ against $\frac{1}{x}$ , not $y$ against $x$ .

Common Pitfalls

Applying linear regression to a clearly curved original scatter diagram without considering transformation.
Choosing the largest $∣ r ∣$ without checking the transformed scatter diagram.
Forgetting domain restrictions such as $x > 0$ or $y > 0$ before taking logarithms.
Treating the intercept and gradient of the transformed line as if they were always the original parameters.
Back-transforming incorrectly, especially in exponential and power models.
Forgetting to interpret the final model in the original variables.

Revision Checklist

Can you explain why linearisation is useful?
Can you match common model forms to the correct transformed variables?
Can you decide whether to plot $y$ against $1/ x$ , $y$ against $x^{2}$ , $ln y$ against $x$ , or $ln y$ against $ln x$ ?
Can you recover original parameters after a logarithmic transformation?
Can you explain why a transformed model still needs contextual judgement?

Singapore H2 Math Wiki

Start Here

Data Transformations and Linearisation

Data Transformations and Linearisation

Scope Label

Role in the Topic

Why Transform Data?

The Linearisation Workflow

Common Transformation Forms

Reading Parameters After Linearisation

Choosing Between Models

Core Example: Power Model

Core Example: Reciprocal Model

Common Pitfalls

Revision Checklist

Graph View

Table of Contents

Backlinks