Training Neural Networks

Preventing Overfitting

Preventing Overfitting - Weight Penalities

ReLU’s

“Vanishing Gradients” are a problem when training deep neural networks…

Logisitic function:

ReLU

Weight Initialization, Pre-Training and AutoEncoders

Momentum

nice visualizations

Batch Normalization

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, Ioffe and Szegedy, 2015.

(Around 53,000 citations on Google Scholar.)