From Stability to Chaos: Analyzing Gradient Descent Dynamics in Quadratic Regression
This work provides insights into training dynamics for machine learning practitioners, though it is incremental as it builds on existing analysis of gradient descent in specific models.
The paper investigates gradient descent dynamics with large constant step-sizes in quadratic regression, revealing five distinct training phases (monotonic, catapult, periodic, chaotic, divergent) through bifurcation analysis, and shows that ergodic trajectory averaging stabilizes test error in non-monotonic phases.
We conduct a comprehensive investigation into the dynamics of gradient descent using large-order constant step-sizes in the context of quadratic regression models. Within this framework, we reveal that the dynamics can be encapsulated by a specific cubic map, naturally parameterized by the step-size. Through a fine-grained bifurcation analysis concerning the step-size parameter, we delineate five distinct training phases: (1) monotonic, (2) catapult, (3) periodic, (4) chaotic, and (5) divergent, precisely demarcating the boundaries of each phase. As illustrations, we provide examples involving phase retrieval and two-layer neural networks employing quadratic activation functions and constant outer-layers, utilizing orthogonal training data. Our simulations indicate that these five phases also manifest with generic non-orthogonal data. We also empirically investigate the generalization performance when training in the various non-monotonic (and non-divergent) phases. In particular, we observe that performing an ergodic trajectory averaging stabilizes the test error in non-monotonic (and non-divergent) phases.