LGOCMay 21, 2023

Understanding Multi-phase Optimization Dynamics and Rich Nonlinear Behaviors of ReLU Networks

arXiv:2305.12467v517 citations
Originality Incremental advance
AI Analysis

This work provides foundational insights into the optimization dynamics of neural networks, though it is incremental as it focuses on a simplified setting.

The authors tackled the challenge of theoretically characterizing the full training process of two-layer ReLU networks on linearly separable data, revealing four distinct phases and specific nonlinear behaviors like initial condensation and plateau escape.

The training process of ReLU neural networks often exhibits complicated nonlinear phenomena. The nonlinearity of models and non-convexity of loss pose significant challenges for theoretical analysis. Therefore, most previous theoretical works on the optimization dynamics of neural networks focus either on local analysis (like the end of training) or approximate linear models (like Neural Tangent Kernel). In this work, we conduct a complete theoretical characterization of the training process of a two-layer ReLU network trained by Gradient Flow on a linearly separable data. In this specific setting, our analysis captures the whole optimization process starting from random initialization to final convergence. Despite the relatively simple model and data that we studied, we reveal four different phases from the whole training process showing a general simplifying-to-complicating learning trend. Specific nonlinear behaviors can also be precisely identified and captured theoretically, such as initial condensation, saddle-to-plateau dynamics, plateau escape, changes of activation patterns, learning with increasing complexity, etc.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes