Data Transformations and Linearisation

Scope Label

Core 9758. Data transformations are explicitly part of the current correlation-and-regression scope.

Role in the Topic

This branch explains what to do when the original scatter diagram is not approximately linear but suggests a structured non-linear relationship.

Use it with:

Why Transform Data?

Correlation and linear regression are designed for linear patterns.

If the scatter diagram is curved, a straight-line regression on the original variables may be misleading. However, some curved relationships can be rewritten as straight-line relationships by transforming one or both variables.

This is called linearisation.

Caption: Linearisation transforms a curved relationship into a new variable pair that can be modelled linearly.

The aim is not to force every dataset into a line. The aim is to test a plausible model form.

The Linearisation Workflow

A reliable workflow is:

  1. Inspect the original scatter diagram.
  2. Identify a plausible non-linear model.
  3. Define the transformed variable or variables.
  4. Draw or inspect the transformed scatter diagram.
  5. Compute for the transformed variables.
  6. Fit a regression line to the transformed variables.
  7. Interpret the fitted model in the original context.

The model with closest to is often better supported, but it should still make contextual sense.

Common Transformation Forms

The goal is to rewrite the model in the form

for suitable transformed variables and .

Original modelTransformationLinear form
let
let
let
let
let ,

Logarithmic transformations require positive values where the logarithm is taken. For example, is only defined when .

Reading Parameters After Linearisation

After transformation, the regression line may not directly use the original parameters.

For example, suppose

Taking logarithms gives

If the transformed regression line is

then

So

The intercept and gradient of the transformed line must be translated back carefully.

Choosing Between Models

When several possible transformations are suggested, compare:

  • the transformed scatter diagrams
  • the values of for the transformed data
  • the reasonableness of the model in context
  • the domain restrictions introduced by the transformation

Do not choose a model from calculator output alone.

A good exam sentence is:

The plot of against is approximately linear and has closest to , so an exponential model is most suitable among the proposed models.

Core Example: Power Model

Suppose a relationship is suspected to have the form

Taking logarithms:

So if we plot against and obtain an approximately straight-line pattern, the model is supported.

If the fitted transformed line is

then

Hence

and the fitted original model is

Core Example: Reciprocal Model

Suppose the data decrease quickly at first and then level off. A possible model is

Let

Then

So the transformed scatter diagram should plot against , not against .

Common Pitfalls

  • Applying linear regression to a clearly curved original scatter diagram without considering transformation.
  • Choosing the largest without checking the transformed scatter diagram.
  • Forgetting domain restrictions such as or before taking logarithms.
  • Treating the intercept and gradient of the transformed line as if they were always the original parameters.
  • Back-transforming incorrectly, especially in exponential and power models.
  • Forgetting to interpret the final model in the original variables.

Revision Checklist

  • Can you explain why linearisation is useful?
  • Can you match common model forms to the correct transformed variables?
  • Can you decide whether to plot against , against , against , or against ?
  • Can you recover original parameters after a logarithmic transformation?
  • Can you explain why a transformed model still needs contextual judgement?