probability and inference

1 probability space and events

Two Ingredients to Modeling Uncertainty

When we think of an uncertain world, we will always think of there being some underlying experiment of interest. To model this uncertain world, it suffices to keep track of two things:

  1. The set of all possible outcomes for the experiment: this set is called the sample space and is usually denoted by the Greek letter Omega Ω .

  2. The probability of each outcome: for each possible outcome, assign a probability that is at least 0 and at most 1. P .

model = {'head':0.5,'tail':0.5}
sample_space = set(model.keys())

important remarks:
(1)
The sample space is always specified to be collectively exhaustive, meaning that every possible outcome is in it, and mutually exclusive, meaning that once the experiment is run (e.g., flipping the fair coin), exactly one possible outcome in the sample space happens. It’s impossible for multiple outcomes in the sample space to simultaneously happen! It’s also impossible for none of the outcomes to happen!

(2)

Probabilities can be thought of as fractions of times outcomes occur; thus, probabilities are non-negative and at least 0 and at most 1.

(3)
If we add up the probabilities of all the possible outcomes in the sample space, we get 1.

PROBABILITY SPACES

At this point, we’ve actually already seen the most basic data structure used throughout this course for modeling uncertainty, called a finite probability space (in this course, we’ll often also just call this either a probability space or a probability model):

A finite probability space consists of two ingredients:
(1)
a sample space Ω consisting of a finite (i.e., not infinite) number of collectively exhaustive and mutually exclusive possible outcomes

(2)
an assignment of probabilities: for each possible outcome ωΩ , we assign a probability P(ω) at least 0 and at most 1, where we require that the probabilities across all the possible outcomes in the sample space add up to 1:

ωΩP(ω)=1

Notes:

As shorthand we occasionally use the tuple “ (Ω,P) ” to refer to a finite probability space to remind ourselves of the two ingredients needed, sample space Ω and an assignment of probabilities P. As we already saw, in code these two pieces can be represented together in a single Python dictionary. However, when we want to reason about probability spaces in terms of the mathematics, it’s helpful to have names for the two pieces.

TABLE REPRESENTATION

A probability space is a data structure in that we can always visualize as a table of non-negative entries that sum to 1. Let’s see a concrete example of this, first writing the table out on paper and then coding it up.

Example: Suppose we have a model of tomorrow’s weather given as follows: sunny with probability 1/2, rainy with probability 1/6, and snowy with probability 1/3. Here’s the probability space, shown as a table:

state probability
sunny 1/2
rainy 1/6
snowy 1/3

Note: This a table of 3 non-negative entries that sum to 1. The rows correspond to the sample space: Ω={sunny,rainy,snowy}

We will often use this table representation of a probability space to tell you how we’re modeling uncertainty for a particular problem. It provides the simplest of visualizations of a probability space.

Of course, in Python code, the above probability space is given by:

 prob_space = {'sunny': 1/2, 'rainy': 1/6, 'snowy': 1/3}

A different way to code up the same probability space is to separately specify the outcomes (i.e., the sample space) and the probabilities:

outcomes = ['sunny', 'rainy', 'snowy']
probabilities = np.array([1/2, 1/6, 1/3])

Events

When we model some uncertain situation, how we specify a sample space is not unique. We saw an example of this already in an earlier exercise where for rolling a single six-sided die, we can choose to name the outcomes differently, saying for instance “roll 1” instead of “1”. We could even add a bunch of extraneous outcomes that all have probability 0. We could add extraneous information that doesn’t matter such as “Alice rolls 1”, “Bob rolls 1”, etc where we enumerate out all the people who could roll the die in which the outcome is a 1. Sure, depending on the problem we are trying to solve, maybe knowing who rolled the die is important, but if we don’t care about who rolled the die, then the information isn’t helpful but it’s still possible to include this information in the sample space.

Generally speaking it’s best to choose a sample space that is as simple as possible for modeling what we care about solving. For example, if we were rolling a six-sided die, and we actually only care about whether the face shows up at least 4 or not, then it’s sufficient to just keep track of two outcomes, “at least 4” and “less than 4”.

codes for operation sets in python:

sample_space = {'HH', 'HT', 'TH', 'TT'}
A = {'HT', 'TT'}
B = {'HH', 'HT', 'TH'}
C = {'HH'}
A_intersect_B = A.intersection(B)#"B.intersection(A)" or "A & B"
A_union_C = A.union(C)  # "C.union(A)" and also "A | C"
B_complement = sample_space.difference(B) # "sample_space - B"

special events: Ω and
P(Ω)=1 and P()=0
if AB then P(A)P(B)1
if AB= then P(AB)=P(A)+P(B)
P(Ac)=1P(A)

we see that an event is a subset of the sample space Ω . If you remember our table representation for a probability space, then an event could be thought of as a subset of the rows, and the probability of the event is just the sum of the probability values in those rows!

The probability of an event AΩ is the sum of the probabilities of the possible outcomes in A :

P(A)=ωAP(ω)

We can translate the above equation into Python code. In particular, we can compute the probability of an event encoded as a Python set event, here the probability space is encoded as a Python dictionary prob_space.
prob_of_event = lambda event, prob_space: sum(prob_space[outcome] for outcome in event)
prob_space = {'sunny': 1/2, 'rainy': 1/6, 'snowy': 1/3}
rainy_or_snowy_event = {'rainy', 'snowy'}
print(prob_of_event(rainy_or_snowy_event, prob_space))

2 random variable

in general, given (Ω,P) , a random variable X maps Ωx . where x means a set of variables that X takes on.

approach 1:

def sample_from_finite_probability_space(finite_prob_space):
    """ Produces a random outcome from a given finite probability space. Input ----- - finite_prob_space: finite probability space encoded as a dictionary Output ------ - random outcome, which is one of the keys in the finite_probability_space dictionary's set of keys (remember: these keys form the sample space) """

    # first produce a list of pairs of the form (outcome, outcome probability)
    outcome_probability_pairs = list(finite_prob_space.items())

    # convert the pairs into two lists "outcomes" and "outcome_probabilities":
    # - outcomes: list of outcomes
    # - outcome_probabilities: i-th element is the probability of the i-th
    # outcome in the "outcomes" list
    # (note that this step is needed because NumPy wants these lists
    # separately)
    outcomes, outcome_probabilities = zip(*outcome_probability_pairs)

    # use NumPy to randomly sample
    random_outcome = np.random.choice(outcomes, p=outcome_probabilities)
    return random_outcome

prob_space = {'sunny': 1/2, 'rainy': 1/6, 'snowy': 1/3}
W_mapping = {'sunny': 'sunny', 'rainy': 'rainy', 'snowy': 'snowy'}
I_mapping = {'sunny': 1, 'rainy': 0, 'snowy': 0}
random_outcome=sample_from_finite_probability_space(prob_space)
W = W_mapping[random_outcome]
I = I_mapping[random_outcome]

approach 2:

W_table = {'sunny': 1/2, 'rainy': 1/6, 'snowy': 1/3}
I_table = {0: 1/2, 1: 1/2}

W = sample_from_finite_probability_space(W_table)
I = sample_from_finite_probability_space(I_table)

RANDOM VARIABLES NOTATION AND TERMINOLOGY

In this course, we denote random variables with capital/uppercase letters, such as X,W,I, etc.

We use the phrases “probability table“, “probability mass function” (abbreviated as PMF), and “probability distribution” (often simply called a distribution) to mean the same thing, and in particular we denote the probability table for X to be pX or pX() .

We write pX(x) to denote the entry of the probability table that has label xX where X is the set of values that random variable X takes on. Note that we use lowercase letters like x to denote variables storing nonrandom values. We can also look up values in a probability table using specific outcomes, e.g., from earlier, we have pW(rainy)=16 and pI(1)=12 .

Note that we use the same notation as in math where a function f might also be written as f() to explicitly indicate that it is the function of one variable. Both f and f() refer to a function whereas f(x) refers to the value of the function f evaluated at the point x .

As an example of how to use all this notation, recall that a probability table consists of non-negative entries that add up to 1. In fact, each of the entries is at most 1 (otherwise the numbers would add to more than 1). For a random variable X taking on values in χ , we can write out these constraints as:
0pX(x)1, for all xχ

xχpX(x)=1

Often in the course, if we are making statements about all possible outcomes of

X
, we will omit writing out the alphabet χ explicitly. For example, instead of the above, we might write the following equivalent statement:
0pX(x)1, for all x
xpX(x)=1

Exercise:
Functions of Random Variables
Consider the random variable W that we have seen before, where W=sunny with probability 12 , W=rainy with probability 16 , and W=snowy with probability 13 . Consider a function f that maps ‘sunny’ and ‘rainy’ to 3, and ‘snowy’ to 42.

f(W) is also a random variable. Express the probability table for f(W) as a Python dictionary.

{3:2/3,42:1/3}

Is (f(W))2 also a random variable? If yes, provide the probability table for (f(W))2 as a Python dictionary.

{9:2/3,1764:1/3}

In general, for a real-valued function g (i.e., it maps real numbers to real numbers), is g(f(W)) a random variable? Yes!

3 Jointly Distributed Random Variables

RELATING TWO RANDOM VARIABLES

At the most basic level, inference refers to using an observation to reason about some unknown quantity. In this course, the observation and the unknown quantity are represented by random variables. The main modeling question is: How do these random variables relate?

Let’s build on our earlier weather example, where now another outcome of interest appears, the temperature, which we quantize into to possible values “hot” and “cold”. Let’s suppose that we have the following probability space: