Maximum Likelihood Learning

Bayesian Learning

Maximum a Posteriori Learning

Maximum Likelihood Learning

Turning our single layer network into a probability distribution...

Aside: Derivative of the Sigmoid/Logistic Function

The logistic function has a simple derivative:

\[\sigma'(x) = \sigma(x)(1 - \sigma(x))\]

One Weird Trick...

Maximum Likelihood for Logistic Regression:

Why not just use MSE?