Neural intel Pod

SAD Neural Networks, Divergent Gradient Flows, and Optimality


Listen Later

This academic paper explores the training dynamics of neural networks, specifically focusing on gradient flow for fully connected feedforward networks with various smooth activation functions. The authors establish a dichotomy, showing that gradient flow either converges to a critical point or diverges to infinity while the loss approaches a generalized critical value. Utilizing the mathematical framework of o-minimal structures, they prove that for certain nonlinear polynomial target functions, sufficiently large networks and datasets lead to loss values approaching zero only asymptotically, causing the gradient flow to diverge when initialized well. The paper supports these theoretical findings with numerical experiments on polynomial regression and real-world tasks, observing the parameter norm increasing as the loss decreases.

...more
View all episodesView all episodes
Download on the App Store

Neural intel PodBy Neural Intelligence Network