The Underlying Correlated Dynamics in Neural Training
This work addresses the computational challenge of understanding training dynamics for large neural networks, offering a method that could lead to better acceleration techniques, though it appears incremental as it builds on existing dynamics modeling approaches.
The authors tackled the problem of modeling neural network training dynamics by proposing a correlation-based model that groups parameters into correlated modes, achieving significant dimensionality reduction from millions of parameters to just a few modes for networks like ResNet-18, transformers, and GANs, while also improving generalization through induced regularization.
Training of neural networks is a computationally intensive task. The significance of understanding and modeling the training dynamics is growing as increasingly larger networks are being trained. We propose in this work a model based on the correlation of the parameters' dynamics, which dramatically reduces the dimensionality. We refer to our algorithm as \emph{correlation mode decomposition} (CMD). It splits the parameter space into groups of parameters (modes) which behave in a highly correlated manner through the epochs. We achieve a remarkable dimensionality reduction with this approach, where networks like ResNet-18, transformers and GANs, containing millions of parameters, can be modeled well using just a few modes. We observe each typical time profile of a mode is spread throughout the network in all layers. Moreover, our model induces regularization which yields better generalization capacity on the test set. This representation enhances the understanding of the underlying training dynamics and can pave the way for designing better acceleration techniques.