Phase transitions reveal hierarchical structure in deep neural networks
This work provides a foundational explanation for key training phenomena in deep neural networks, which is incremental but clarifies underlying mechanisms for researchers in machine learning.
The authors tackled the problem of understanding training phenomena in deep neural networks by unifying phase transitions, saddle points, and mode connectivity into a geometric framework, showing that phase transitions are governed by saddle points and confirming mode connectivity on MNIST with an efficient algorithm.
Training Deep Neural Networks relies on the model converging on a high-dimensional, non-convex loss landscape toward a good minimum. Yet, much of the phenomenology of training remains ill understood. We focus on three seemingly disparate observations: the occurrence of phase transitions reminiscent of statistical physics, the ubiquity of saddle points, and phenomenon of mode connectivity relevant for model merging. We unify these within a single explanatory framework, the geometry of the loss and error landscapes. We analytically show that phase transitions in DNN learning are governed by saddle points in the loss landscape. Building on this insight, we introduce a simple, fast, and easy to implement algorithm that uses the L2 regularizer as a tool to probe the geometry of error landscapes. We apply it to confirm mode connectivity in DNNs trained on the MNIST dataset by efficiently finding paths that connect global minima. We then show numerically that saddle points induce transitions between models that encode distinct digit classes. Our work establishes the geometric origin of key training phenomena in DNNs and reveals a hierarchy of accuracy basins analogous to phases in statistical physics.