Probability is a numerical measure of the likelihood that an event will occur.
A probability function (or probability measure) is a function that assigns a number between 0 and 1 to each event in the sample space, where:
Interpretation:
Examples with a fair six-sided die:
Non-negativity: For any event 
Total Probability: The probability of the sample space is 1, 
Additivity: For mutually exclusive events 
Definition: A random variable is a function that assigns a value (numeric or categorical) to each outcome in the sample space 
Examples:
Die roll - Parity: If 
Die roll - Numerical: Same sample space, but random variable 
Range and Events:
A Probability Mass Function (PMF) assigns probabilities to the values of a discrete random variable.
Definition: For a discrete random variable 
Requirements: A valid PMF must satisfy:
Example: For our die parity random variable 
Verification: 
Consider a medical scenario where we observe a single patient:
Experiment: Examine one patient and record their disease status and fever status
Sample Space: 
Disease Random Variable (
Fever Random Variable (
The joint probability distribution represents the probability of two events, described by different random variables, happening together:
  | 
 So  This could also be written:   | 
Notice that the rows sum to 1
We can learn a joint probability distribution from data by counting occurrences and computing relative frequencies.
Example: Suppose we examine 10 patients and observe:
| 
 Data 
  | 
 Count each combination: 
  | 
 Compute probabilities:  | 
 Estimated Joint Distribution 
  | 
As we move from specific examples to general probability identities, we shift our notation:
Event-based notation:
Variable-based notation:
This shift allows us to write general identities like:
  | 
  | 
Note that
(by combining marginalization with the chain rule)
So Bayes rule can be expressed as:
We can use Bayes rule to build a classifier:
Where 
There is a serious problem with this! What is it?
P(x) = .2 P(~x) = .8 P(c | x) = .2 P(f | x) = .7 P(h | x) = .1 P(c | ~x) = .1 P(f | ~x) = .05 P(h | ~x) = .85