The joint probability distribution represents the probability of two events, described by different random variables, happening together:
\(X\) | \(Y\) | \(P(X,Y)\) |
---|---|---|
T | cold | .04 |
T | flu | .14 |
T | healthy | .02 |
F | cold | .08 |
F | flu | .04 |
F |
healthy |
.68 |
So \(P(X=True, Y=cold) = .04\).
This could also be written: \(P(X=True \;\; \cap\;\; Y=cold) = .04\)
\(X\) | \(Y\) | \(P(X,Y)\) |
---|---|---|
T | cold | .04 |
T | flu | .14 |
T | healthy | .02 |
F | cold | .08 |
F | flu | .04 |
F |
healthy |
.68 |
\[P(Y \mid X) = \frac{P(X \mid Y)P(Y)}{P(X)}\]
Note that
\[P(X) = \sum_{y \in Y} P(X \mid y) P(y)\]
(by combining marginalization with the chain rule)
So Bayes rule can be expressed as:
\[P(Y \mid X) = \frac{P(X \mid Y)P(Y)}{ \sum_{y \in Y} P(X \mid y) P(y)}\]
We can use Bayes rule to build a classifier:
\[P(Y \mid X_1, X_2, ..., X_d) = \frac{P(X_1, X_2, ..., X_d \mid Y)P(Y)}{P(X_1, X_2, ..., X_d)}\]
Where \(Y\) corresponds to the class label and each \(X_i\) is an attribute.
There is a serious problem with this! What is it?