The Training Process of Many Deep Networks Explores the Same Low-Dimensional Manifold
This work provides foundational insights into the training process of deep learning models, potentially benefiting researchers and practitioners by simplifying optimization and generalization analysis.
The authors tackled the problem of understanding the training dynamics of deep networks by analyzing their prediction trajectories, revealing that diverse networks explore the same low-dimensional manifold during training, with factors like architecture causing distinguishable paths but others having minimal influence.
We develop information-geometric techniques to analyze the trajectories of the predictions of deep networks during training. By examining the underlying high-dimensional probabilistic models, we reveal that the training process explores an effectively low-dimensional manifold. Networks with a wide range of architectures, sizes, trained using different optimization methods, regularization techniques, data augmentation techniques, and weight initializations lie on the same manifold in the prediction space. We study the details of this manifold to find that networks with different architectures follow distinguishable trajectories but other factors have a minimal influence; larger networks train along a similar manifold as that of smaller networks, just faster; and networks initialized at very different parts of the prediction space converge to the solution along a similar manifold.