probability and inference
1 probability space and events
Two Ingredients to Modeling Uncertainty
When we think of an uncertain world, we will always think of there being some underlying experiment of interest. To model this uncertain world, it suffices to keep track of two things:
The set of all possible outcomes for the experiment: this set is called the sample space and is usually denoted by the Greek letter Omega
Ω .The probability of each outcome: for each possible outcome, assign a probability that is at least 0 and at most 1.
P .
model = {'head':0.5,'tail':0.5}
sample_space = set(model.keys())
important remarks:
(1)
The sample space is always specified to be collectively exhaustive, meaning that every possible outcome is in it, and mutually exclusive, meaning that once the experiment is run (e.g., flipping the fair coin), exactly one possible outcome in the sample space happens. It’s impossible for multiple outcomes in the sample space to simultaneously happen! It’s also impossible for none of the outcomes to happen!
(2)
Probabilities can be thought of as fractions of times outcomes occur; thus, probabilities are non-negative and at least 0 and at most 1.
(3)
If we add up the probabilities of all the possible outcomes in the sample space, we get 1.
PROBABILITY SPACES
At this point, we’ve actually already seen the most basic data structure used throughout this course for modeling uncertainty, called a finite probability space (in this course, we’ll often also just call this either a probability space or a probability model):
A finite probability space consists of two ingredients:
(1)
a sample space
(2)
an assignment of probabilities: for each possible outcome
Notes:
As shorthand we occasionally use the tuple “
TABLE REPRESENTATION
A probability space is a data structure in that we can always visualize as a table of non-negative entries that sum to 1. Let’s see a concrete example of this, first writing the table out on paper and then coding it up.
Example: Suppose we have a model of tomorrow’s weather given as follows: sunny with probability 1/2, rainy with probability 1/6, and snowy with probability 1/3. Here’s the probability space, shown as a table:
state | probability |
---|---|
sunny | 1/2 |
rainy | 1/6 |
snowy | 1/3 |
Note: This a table of 3 non-negative entries that sum to 1. The rows correspond to the sample space:
We will often use this table representation of a probability space to tell you how we’re modeling uncertainty for a particular problem. It provides the simplest of visualizations of a probability space.
Of course, in Python code, the above probability space is given by:
prob_space = {'sunny': 1/2, 'rainy': 1/6, 'snowy': 1/3}
A different way to code up the same probability space is to separately specify the outcomes (i.e., the sample space) and the probabilities:
outcomes = ['sunny', 'rainy', 'snowy']
probabilities = np.array([1/2, 1/6, 1/3])
Events
When we model some uncertain situation, how we specify a sample space is not unique. We saw an example of this already in an earlier exercise where for rolling a single six-sided die, we can choose to name the outcomes differently, saying for instance “roll 1” instead of “1”. We could even add a bunch of extraneous outcomes that all have probability 0. We could add extraneous information that doesn’t matter such as “Alice rolls 1”, “Bob rolls 1”, etc where we enumerate out all the people who could roll the die in which the outcome is a 1. Sure, depending on the problem we are trying to solve, maybe knowing who rolled the die is important, but if we don’t care about who rolled the die, then the information isn’t helpful but it’s still possible to include this information in the sample space.
Generally speaking it’s best to choose a sample space that is as simple as possible for modeling what we care about solving. For example, if we were rolling a six-sided die, and we actually only care about whether the face shows up at least 4 or not, then it’s sufficient to just keep track of two outcomes, “at least 4” and “less than 4”.
codes for operation sets in python:
sample_space = {'HH', 'HT', 'TH', 'TT'}
A = {'HT', 'TT'}
B = {'HH', 'HT', 'TH'}
C = {'HH'}
A_intersect_B = A.intersection(B)#"B.intersection(A)" or "A & B"
A_union_C = A.union(C) # "C.union(A)" and also "A | C"
B_complement = sample_space.difference(B) # "sample_space - B"
special events:
if
if
we see that an event is a subset of the sample space
The probability of an event
We can translate the above equation into Python code. In particular, we can compute the probability of an event encoded as a Python set event, here the probability space is encoded as a Python dictionary prob_space.
prob_of_event = lambda event, prob_space: sum(prob_space[outcome] for outcome in event)
prob_space = {'sunny': 1/2, 'rainy': 1/6, 'snowy': 1/3}
rainy_or_snowy_event = {'rainy', 'snowy'}
print(prob_of_event(rainy_or_snowy_event, prob_space))
2 random variable
in general, given
approach 1:
def sample_from_finite_probability_space(finite_prob_space):
""" Produces a random outcome from a given finite probability space. Input ----- - finite_prob_space: finite probability space encoded as a dictionary Output ------ - random outcome, which is one of the keys in the finite_probability_space dictionary's set of keys (remember: these keys form the sample space) """
# first produce a list of pairs of the form (outcome, outcome probability)
outcome_probability_pairs = list(finite_prob_space.items())
# convert the pairs into two lists "outcomes" and "outcome_probabilities":
# - outcomes: list of outcomes
# - outcome_probabilities: i-th element is the probability of the i-th
# outcome in the "outcomes" list
# (note that this step is needed because NumPy wants these lists
# separately)
outcomes, outcome_probabilities = zip(*outcome_probability_pairs)
# use NumPy to randomly sample
random_outcome = np.random.choice(outcomes, p=outcome_probabilities)
return random_outcome
prob_space = {'sunny': 1/2, 'rainy': 1/6, 'snowy': 1/3}
W_mapping = {'sunny': 'sunny', 'rainy': 'rainy', 'snowy': 'snowy'}
I_mapping = {'sunny': 1, 'rainy': 0, 'snowy': 0}
random_outcome=sample_from_finite_probability_space(prob_space)
W = W_mapping[random_outcome]
I = I_mapping[random_outcome]
approach 2:
W_table = {'sunny': 1/2, 'rainy': 1/6, 'snowy': 1/3}
I_table = {0: 1/2, 1: 1/2}
W = sample_from_finite_probability_space(W_table)
I = sample_from_finite_probability_space(I_table)
RANDOM VARIABLES NOTATION AND TERMINOLOGY
In this course, we denote random variables with capital/uppercase letters, such as
We use the phrases “probability table“, “probability mass function” (abbreviated as PMF), and “probability distribution” (often simply called a distribution) to mean the same thing, and in particular we denote the probability table for
We write
Note that we use the same notation as in math where a function
As an example of how to use all this notation, recall that a probability table consists of non-negative entries that add up to 1. In fact, each of the entries is at most 1 (otherwise the numbers would add to more than 1). For a random variable
Often in the course, if we are making statements about all possible outcomes of
Exercise:
Functions of Random Variables
Consider the random variable
{3:2/3,42:1/3}
Is
{9:2/3,1764:1/3}
In general, for a real-valued function g (i.e., it maps real numbers to real numbers), is
3 Jointly Distributed Random Variables
RELATING TWO RANDOM VARIABLES
At the most basic level, inference refers to using an observation to reason about some unknown quantity. In this course, the observation and the unknown quantity are represented by random variables. The main modeling question is: How do these random variables relate?
Let’s build on our earlier weather example, where now another outcome of interest appears, the temperature, which we quantize into to possible values “hot” and “cold”. Let’s suppose that we have the following probability space: