LGRTOct 31, 2022

Symmetries, flat minima, and the conserved quantities of gradient flow

arXiv:2210.17216v239 citationsh-index: 32
Originality Incremental advance
AI Analysis

This work addresses a fundamental theoretical gap in deep learning for researchers, providing insights into loss landscape structure and its implications for training dynamics, though it is incremental in extending symmetry analysis to nonlinear networks.

The paper tackles the problem of understanding the theoretical origin of low-loss valleys in deep network loss landscapes by developing a framework to find continuous symmetries in parameter space, which carve out these valleys and enable ensemble building that improves robustness under certain adversarial attacks, with insights on how initialization impacts convergence and generalizability.

Empirical studies of the loss landscape of deep networks have revealed that many local minima are connected through low-loss valleys. Yet, little is known about the theoretical origin of such valleys. We present a general framework for finding continuous symmetries in the parameter space, which carve out low-loss valleys. Our framework uses equivariances of the activation functions and can be applied to different layer architectures. To generalize this framework to nonlinear neural networks, we introduce a novel set of nonlinear, data-dependent symmetries. These symmetries can transform a trained model such that it performs similarly on new samples, which allows ensemble building that improves robustness under certain adversarial attacks. We then show that conserved quantities associated with linear symmetries can be used to define coordinates along low-loss valleys. The conserved quantities help reveal that using common initialization methods, gradient flow only explores a small part of the global minimum. By relating conserved quantities to convergence rate and sharpness of the minimum, we provide insights on how initialization impacts convergence and generalizability.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes