Probability Distribution and CDF

Scope Label

Core 9758. This note covers the distribution-table and cumulative-probability language needed for discrete random variables.

Use it with the hub Discrete Random Variables.

Notation Recap

This note assumes that is a discrete random variable.

Keep three objects separate:

SymbolMeaning
the random variable
one possible value of the random variable
the event that takes the value

So is the probability of an event, even though it is written using numerical notation.

From Experiment to Distribution

There is a natural chain:

  1. start with a random experiment
  2. define a random variable from that experiment
  3. identify the possible values of the variable
  4. attach probabilities to those values
  5. use the distribution to calculate probabilities and summaries

The experiment comes first. The random variable is built from the experiment. The distribution is built from the random variable.

For example, suppose two fair coins are tossed and is the number of heads.

OutcomeValue of

The possible values of are , and the probability distribution is:

The distribution no longer lists the original outcomes. It lists the possible numerical values of and their probabilities.

Probability Distribution Function

The probability distribution function of a discrete random variable tells us the probability attached to each possible value.

If can take values , then the distribution gives:

for each possible value .

In some textbooks this exact-value function is called the probability mass function. In this wiki, “probability distribution function” means the same discrete table of probabilities.

For a discrete random variable, pdf-style thinking answers “exactly this value”.

For example:

  • exactly heads
  • exactly defective items
  • exactly arrivals
  • exactly successes

Conceptually, the distribution function is the full probability model for the variable. It tells us:

  • which values are possible
  • how likely each value is
  • how later quantities such as expectation and variance can be calculated

Two structural facts must always hold:

for every possible value, and

across all possible values.

These facts encode that the possible values are exhaustive and mutually exclusive.

Caption: A discrete probability distribution assigns a valid probability to each possible value and the total must sum to .

Checking and Constructing a Valid Distribution

A proposed distribution must be checked before it is used.

Ask two questions:

  1. Are all probabilities valid?
  2. Do all probabilities add to ?

For example, suppose has distribution:

To make this a valid distribution:

So:

and hence:

Therefore:

The important idea is not just solving for . It is using the fact that all possible values of must account for total probability .

Cumulative Distribution Function

The cumulative distribution function records running probability totals.

For a discrete random variable ,

is the probability that the value of does not exceed .

The pdf and cdf answer different questions:

  • the pdf gives probability at a single value
  • the cdf gives total probability up to a value

That is why the cdf is naturally:

  • between and
  • non-decreasing

The cdf is useful when the wording includes:

  • at most
  • not more than
  • up to
  • no greater than

For example, if represents the number of defective items in a sample, then:

means the probability of at most defective items:

Caption: The cdf is built by adding probabilities from the distribution up to the chosen value.

Translating Inequality Language

Many discrete-random-variable questions are about translating words into precise inequality notation.

For integer-valued :

This is because if only takes integer values, being less than means being at most .

Similarly:

This uses the complement.

For an interval:

This removes everything up to , leaving .

Translation Guide

WordingMathematical formCommon method
exactly use pdf
at most use cdf
not more than use cdf
fewer than convert if is integer-valued
at least often use complement
more than often use complement
between and inclusivesubtract cumulative probabilities carefully

The main habit is to convert words into precise probability notation before calculating.

Worked Example: Distribution and CDF

Let be the number of heads when two fair coins are tossed.

The probability distribution is:

The cdf is:

For example:

Also:

The values are simple here, but the method is the same for larger distribution tables.

Common Pitfalls

MistakeBetter thinking
Using a table before checking that probabilities sum to First verify that the proposed distribution is valid
Treating and as the same objectThe pdf is exact-value probability; the cdf is accumulated probability
Translating as for integer-valued If is integer-valued, means
Forgetting endpoint inclusion in intervalsRead , , , and carefully
Using complement without checking the boundaryFor integer-valued variables, the complement boundary shifts by one

Revision Checklist

  • Can you build a distribution table from a simple experiment?
  • Can you check that all probabilities are valid?
  • Can you use to find an unknown constant?
  • Can you form a cdf table from a pdf table?
  • Can you explain why a cdf is non-decreasing?
  • Can you translate “at most”, “fewer than”, “at least”, and “between” correctly?