Maximum Likelihood Learning and Logistic Regression

Review: Linear Regression

The goal was to find weight values to minimize MSE on some training data: fitting a hyperplane.

Turning our single layer network into a probability distribution…

Today: Logisitic Regression

After applying a sigmoid non-linearity we can interpret the output as a probability.

Aside: Derivative of the Sigmoid/Logistic Function

The logistic function has a simple derivative:

\[\sigma'(x) = \sigma(x)(1 - \sigma(x))\]

Maximum a Posteriori Learning

Maximum Likelihood Learning

One Weird Trick…

Maximum Likelihood for Logistic Regression:

Why not just use MSE?

Minimizing Cross Entropy Loss for Logistic Regression

What If We Have More Than Two Classes?

Pros and Cons of Logistic Regression