On the Rate of Convergence of GD in Non-linear Neural Networks: An Adversarial Robustness Perspective
This provides foundational insights into adversarial robustness in non-linear neural networks, though it is incremental as it focuses on a simplified setting.
The paper tackles the problem of Gradient Descent (GD) convergence in a minimal two-neuron ReLU network for binary classification, proving that it achieves optimal robustness margin but at a prohibitively slow rate of Θ(1/ln(t)), and empirically shows this rate persists across initializations.
We study the convergence dynamics of Gradient Descent (GD) in a minimal binary classification setting, consisting of a two-neuron ReLU network and two training instances. We prove that even under these strong simplifying assumptions, while GD successfully converges to an optimal robustness margin, effectively maximizing the distance between the decision boundary and the training points, this convergence occurs at a prohibitively slow rate, scaling strictly as $Θ(1/\ln(t))$. To the best of our knowledge, this establishes the first explicit lower bound on the convergence rate of the robustness margin in a non-linear model. Through empirical simulations, we further demonstrate that this inherent failure mode is pervasive, exhibiting the exact same tight convergence rate across multiple natural network initializations. Our theoretical guarantees are derived via a rigorous analysis of the GD trajectories across the distinct activation patterns of the model. Specifically, we develop tight control over the system's dynamics to bound the trajectory of the decision boundary, overcoming the primary technical challenge introduced by the non-linear nature of the architecture.