L2 Regularization:
Also knowns as weight decay or Ridge regression
L1 Regularization:
https://commons.wikimedia.org/wiki/File:Regularization.jpg
“Vanishing Gradients” are a problem when training deep neural networks…
Logisitic function:
ReLU
(Around 53,000 citations on Google Scholar.)
At each layer, each input is adjusted according to:
Then adjusted as:
to restore representational power.
Space, Right Arrow or swipe left to move to next slide, click help below for more details