AI: post transformers

Xavier Initialization: Deep Feedforward Networks: Training Difficulties and Solutions


Listen Later

This document explores the challenges associated with training deep feedforward neural networks, specifically investigating why standard gradient descent with random initialization performs poorly. The authors examine the impact of various non-linear activation functions, like sigmoid, hyperbolic tangent, and a new softsign function, on network performance and the issue of unit saturation. They further analyze how activations and gradients change across layers and during training, leading to the proposal of a novel initialization scheme designed to accelerate convergence. The findings suggest that appropriate activation functions and initialization techniques are crucial for improving the learning dynamics and overall effectiveness of deep neural networks.


Source: https://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf

...more
View all episodesView all episodes
Download on the App Store

AI: post transformersBy mcgrof