Regularization Implies balancedness in the deep linear network
This work provides a theoretical foundation for balancedness in deep learning, which is incremental as it builds on existing geometric invariant theory and applies it to linear networks.
The paper tackles the problem of understanding training dynamics in deep linear networks by showing that L2 regularization leads to balancedness, decomposing dynamics into separate regularizing and learning flows, and providing a mathematical framework linking balancedness to model reduction and Bayesian principles.
We use geometric invariant theory (GIT) to study the deep linear network (DLN). The Kempf-Ness theorem is used to establish that the $L^2$ regularizer is minimized on the balanced manifold. This allows us to decompose the training dynamics into two distinct gradient flows: a regularizing flow on fibers and a learning flow on the balanced manifold. We show that the regularizing flow is exactly solvable using the moment map. This approach provides a common mathematical framework for balancedness in deep learning and linear systems theory. We use this framework to interpret balancedness in terms of model reduction and Bayesian principles.