LGOCMay 21, 2024

Keep the Momentum: Conservation Laws beyond Euclidean Gradient Flows

arXiv:2405.12888v19 citationsh-index: 6ICML
Originality Highly original
AI Analysis

This work addresses a theoretical gap in understanding conservation laws for machine learning optimization, which is incremental but provides foundational insights for researchers in optimization and neural network training.

The paper tackles the problem of characterizing conservation laws for momentum-based dynamics in non-Euclidean geometries, proving that these laws exhibit temporal dependence and often result in a 'conservation loss' compared to gradient flows, with specific findings such as fewer laws for linear networks and none for ReLU networks.

Conservation laws are well-established in the context of Euclidean gradient flow dynamics, notably for linear or ReLU neural network training. Yet, their existence and principles for non-Euclidean geometries and momentum-based dynamics remain largely unknown. In this paper, we characterize "all" conservation laws in this general setting. In stark contrast to the case of gradient flows, we prove that the conservation laws for momentum-based dynamics exhibit temporal dependence. Additionally, we often observe a "conservation loss" when transitioning from gradient flow to momentum dynamics. Specifically, for linear networks, our framework allows us to identify all momentum conservation laws, which are less numerous than in the gradient flow case except in sufficiently over-parameterized regimes. With ReLU networks, no conservation law remains. This phenomenon also manifests in non-Euclidean metrics, used e.g. for Nonnegative Matrix Factorization (NMF): all conservation laws can be determined in the gradient flow context, yet none persists in the momentum case.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes