In this episode:
• Welcome and Introduction: Professor Norris and Linda introduce the episode and the paper of the week: 'Backward Gradient Normalization in Deep Neural Networks'.
• The Ghost of Gradients Past: A discussion on the classic vanishing and exploding gradient problems, and why existing solutions like Batch Normalization and ResNets still leave room for improvement.
• Unpacking Backward Gradient Normalization: Linda explains the core mechanics of the BGN layer, detailing how it leaves the forward pass untouched while scaling gradients during backpropagation.
• Visualizing the Flow: The hosts delve into the paper's experiments with 90-layer deep networks, comparing gradient decay across ReLU, Sigmoid, and Tanh activation functions.
• Results, Trade-offs, and Conclusions: A breakdown of the accuracy improvements and training time efficiency of BGN compared to Batch Normalization on the MNIST dataset, followed by final thoughts.