
Sign up to save your podcasts
Or
This academic paper explores the training dynamics of neural networks, specifically focusing on gradient flow for fully connected feedforward networks with various smooth activation functions. The authors establish a dichotomy, showing that gradient flow either converges to a critical point or diverges to infinity while the loss approaches a generalized critical value. Utilizing the mathematical framework of o-minimal structures, they prove that for certain nonlinear polynomial target functions, sufficiently large networks and datasets lead to loss values approaching zero only asymptotically, causing the gradient flow to diverge when initialized well. The paper supports these theoretical findings with numerical experiments on polynomial regression and real-world tasks, observing the parameter norm increasing as the loss decreases.
This academic paper explores the training dynamics of neural networks, specifically focusing on gradient flow for fully connected feedforward networks with various smooth activation functions. The authors establish a dichotomy, showing that gradient flow either converges to a critical point or diverges to infinity while the loss approaches a generalized critical value. Utilizing the mathematical framework of o-minimal structures, they prove that for certain nonlinear polynomial target functions, sufficiently large networks and datasets lead to loss values approaching zero only asymptotically, causing the gradient flow to diverge when initialized well. The paper supports these theoretical findings with numerical experiments on polynomial regression and real-world tasks, observing the parameter norm increasing as the loss decreases.