LGDIS-NNMay 2, 2023

The Training Process of Many Deep Networks Explores the Same Low-Dimensional Manifold

arXiv:2305.01604v330 citations
Originality Highly original
AI Analysis

This work provides foundational insights into the training process of deep learning models, potentially benefiting researchers and practitioners by simplifying optimization and generalization analysis.

The authors tackled the problem of understanding the training dynamics of deep networks by analyzing their prediction trajectories, revealing that diverse networks explore the same low-dimensional manifold during training, with factors like architecture causing distinguishable paths but others having minimal influence.

We develop information-geometric techniques to analyze the trajectories of the predictions of deep networks during training. By examining the underlying high-dimensional probabilistic models, we reveal that the training process explores an effectively low-dimensional manifold. Networks with a wide range of architectures, sizes, trained using different optimization methods, regularization techniques, data augmentation techniques, and weight initializations lie on the same manifold in the prediction space. We study the details of this manifold to find that networks with different architectures follow distinguishable trajectories but other factors have a minimal influence; larger networks train along a similar manifold as that of smaller networks, just faster; and networks initialized at very different parts of the prediction space converge to the solution along a similar manifold.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes