LGAIMLAug 28, 2024

Remove Symmetries to Control Model Expressivity and Improve Optimization

MIT
arXiv:2408.15495v46 citationsh-index: 18
Originality Incremental advance
AI Analysis

This addresses a fundamental optimization issue in deep learning that affects training across many applications, though it appears incremental as it builds on known symmetry problems.

The paper tackles the problem of neural networks getting trapped in low-capacity states due to symmetries in the loss function, which hinders training, and proposes a model-agnostic algorithm called syre to remove these symmetries, leading to improved optimization and performance.

When symmetry is present in the loss function, the model is likely to be trapped in a low-capacity state that is sometimes known as a "collapse". Being trapped in these low-capacity states can be a major obstacle to training across many scenarios where deep learning technology is applied. We first prove two concrete mechanisms through which symmetries lead to reduced capacities and ignored features during training and inference. We then propose a simple and theoretically justified algorithm, syre, to remove almost all symmetry-induced low-capacity states in neural networks. When this type of entrapment is especially a concern, removing symmetries with the proposed method is shown to correlate well with improved optimization or performance. A remarkable merit of the proposed method is that it is model-agnostic and does not require any knowledge of the symmetry.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes