Mechanical Dreams

Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers


Listen Later

...more
View all episodesView all episodes
Download on the App Store

Mechanical DreamsBy Mechanical Dirk