Introduction to Probability for ML

Probability Distributions and Random Variables

Joint Probability Distributions

The joint probability distribution represents the probability of two events, described by different random variables, happening together:

\(X\) \(Y\) \(P(X,Y)\)
T cold .04
T flu .14
T healthy .02
F cold .08
F flu .04
F healthy
.68

So \(P(X=True, Y=cold) = .04\).

This could also be written: \(P(X=True \;\; \cap\;\; Y=cold) = .04\)

Marginalization

\(X\) \(Y\) \(P(X,Y)\)
T cold .04
T flu .14
T healthy .02
F cold .08
F flu .04
F healthy
.68

Conditional Probability

Bayes Theorem / Bayes Rule

\[P(Y \mid X) = \frac{P(X \mid Y)P(Y)}{P(X)}\]

Note that

\[P(X) = \sum_{y \in Y} P(X \mid y) P(y)\]

(by combining marginalization with the chain rule)

So Bayes rule can be expressed as:

\[P(Y \mid X) = \frac{P(X \mid Y)P(Y)}{ \sum_{y \in Y} P(X \mid y) P(y)}\]

Bayes Classifier

We can use Bayes rule to build a classifier:

\[P(Y \mid X_1, X_2, ..., X_d) = \frac{P(X_1, X_2, ..., X_d \mid Y)P(Y)}{P(X_1, X_2, ..., X_d)}\]

Where \(Y\) corresponds to the class label and each \(X_i\) is an attribute.

There is a serious problem with this! What is it?

Naive Bayes Classifier