Naive Bayes Examples and Discussion
“Training” a Naive Bayes Classifier
- Recall the naive Bayes classifier:
- \[P(Y \mid X_1, X_2, ..., X_d) \propto
P(Y)\prod_{i=1}^d P(X_i \mid Y)\]
- To perform classification we need the:
- Class priors: \(P(Y)\)
- Class-conditional attribute probabilities: \(P(X_i \mid Y)\) for all \(i\).
- These were the tallies from our in-class exercise:
-
\(Spy\)
|
\(Golfer\)
|
\(Fedora\)
|
Count
|
T
|
T
|
T
|
1
|
T
|
T
|
F
|
3
|
T
|
F
|
T
|
1
|
T
|
F
|
F
|
0
|
F
|
T
|
T
|
4
|
F
|
T
|
F
|
3
|
F
|
F
|
T
|
6
|
F
|
F
|
F
|
2
|
- From this we can easily estimate our priors:
- \(P(Spy=True) = 5/20 = .25\)
- \(P(Spy=False) = 15/20 = .75\)
- We can also calculate the (full) class-conditional probability
distributions:
\(Golfer\)
|
\(Fedora\)
|
\(P(Golfer, Fedora | Spy = True)\)
|
T
|
T
|
1/5 = .2
|
T
|
F
|
3/5 = .6
|
F
|
T
|
1/5 = .2
|
F
|
F
|
0
|
\(Golfer\)
|
\(Fedora\)
|
\(P(Golfer, Fedora | Spy = False)\)
|
T
|
T
|
4/15 \(\approx\) .27
|
T
|
F
|
3/15 = .2
|
F
|
T
|
6/15 = .4
|
F
|
F
|
2/15 \(\approx\) .13
|
- However, for naive Bayes classification we instead need this:
\(Golfer\)
|
\(P(Golfer | Spy = True)\)
|
T
|
4/5 = .8
|
F
|
1/5 = .2
|
\(Golfer\)
|
\(P(Golfer | Spy = False)\)
|
T
|
7/15 \(\approx\) .47
|
F
|
8/15 \(\approx\) .53
|
\(Fedora\)
|
\(P(Fedora | Spy = True)\)
|
T
|
2/5 = .4
|
F
|
3/5 = .6
|
\(Fedora\)
|
\(P(Fedora | Spy = False)\)
|
T
|
10/5 \(\approx\) .66
|
F
|
5/15 \(\approx\) .33
|
Properties of Naive-Bayes
- Pros:
- Provides a meaningful class probability, not just a class label
- Works in the face of missing attributes (just don’t include them in
the calculation)
- Relatively easy to interpret: we can examine the class-conditional
probabilities for individual attributes.
- Cons:
- Classification performance may be worse than other classifiers: Most
real classification tasks will violate the independence
assumption to some extent.
Implementation Issues
- Naive Bayes classifier: \[P(Y \mid X_1,
X_2, ..., X_d) \propto P(Y)\prod_{i=1}^d P(X_i \mid Y)\]
- Each \(P(X_i \mid Y)\) is less then
1.
- What is \(.5^{100}\)? \(.5^{1000}\)?
- Recall that \(\log(ab) = \log(a) +
\log(b)\)
- Also, the log function is monotonic: if \(a > b\) then \(\log(a) > \log(b)\)
- So, practical implementations generally work with logs: \[\log \left(P(Y)\prod_{i=1}^d P(X_i \mid Y)\right)
= \log (P(Y)) + \sum_{i=1}^d \log(P(X_i \mid Y))\]
Implementation Issues
- How to handle zeros for some attributes? …