Nonlinear Dynamics In Optimization Landscape of Shallow Neural Networks with Tunable Leaky ReLU
This work provides theoretical insights into optimization challenges for neural networks, but it is incremental as it builds on existing equivariant gradient degree methods.
The authors studied the nonlinear dynamics in the optimization landscape of shallow neural networks with leaky ReLU activation, establishing a theoretical framework to detect bifurcations of critical points as the leaky parameter varies, revealing that multi-mode degeneracy occurs at a critical number independent of width.
In this work, we study the nonlinear dynamics of a shallow neural network trained with mean-squared loss and leaky ReLU activation. Under Gaussian inputs and equal layer width k, (1) we establish, based on the equivariant gradient degree, a theoretical framework, applicable to any number of neurons k>= 4, to detect bifurcation of critical points with associated symmetries from global minimum as leaky parameter $α$ varies. Typically, our analysis reveals that a multi-mode degeneracy consistently occurs at the critical number 0, independent of k. (2) As a by-product, we further show that such bifurcations are width-independent, arise only for nonnegative $α$ and that the global minimum undergoes no further symmetry-breaking instability throughout the engineering regime $α$ in range (0,1). An explicit example with k=5 is presented to illustrate the framework and exhibit the resulting bifurcation together with their symmetries.