Sign up to save your podcastsEmail addressPasswordRegisterOrContinue with GoogleAlready have an account? Log in here.
December 20, 2024Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers9 minutesPlay...moreShareView all episodesBy Mechanical DirkDecember 20, 2024Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers9 minutesPlay...more
December 20, 2024Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers9 minutesPlay...more