Remove Symmetries to Control Model Expressivity and Improve Optimization
This addresses a fundamental optimization issue in deep learning that affects training across many applications, though it appears incremental as it builds on known symmetry problems.
The paper tackles the problem of neural networks getting trapped in low-capacity states due to symmetries in the loss function, which hinders training, and proposes a model-agnostic algorithm called syre to remove these symmetries, leading to improved optimization and performance.
When symmetry is present in the loss function, the model is likely to be trapped in a low-capacity state that is sometimes known as a "collapse". Being trapped in these low-capacity states can be a major obstacle to training across many scenarios where deep learning technology is applied. We first prove two concrete mechanisms through which symmetries lead to reduced capacities and ignored features during training and inference. We then propose a simple and theoretically justified algorithm, syre, to remove almost all symmetry-induced low-capacity states in neural networks. When this type of entrapment is especially a concern, removing symmetries with the proposed method is shown to correlate well with improved optimization or performance. A remarkable merit of the proposed method is that it is model-agnostic and does not require any knowledge of the symmetry.