LGNov 29, 2024

On the Performance Analysis of Momentum Method: A Frequency Domain Perspective

arXiv:2411.19671v65 citationsh-index: 4ICLR
Originality Incremental advance
AI Analysis

This work addresses a fundamental uncertainty in stochastic gradient methods for machine learning practitioners, though it is incremental as it builds on existing momentum-based optimizers.

The paper tackled the problem of optimal momentum coefficient selection in neural network training by analyzing momentum as a time-variant filter in the frequency domain, leading to the proposal of FSGDM, which outperforms conventional momentum optimizers in experiments.

Momentum-based optimizers are widely adopted for training neural networks. However, the optimal selection of momentum coefficients remains elusive. This uncertainty impedes a clear understanding of the role of momentum in stochastic gradient methods. In this paper, we present a frequency domain analysis framework that interprets the momentum method as a time-variant filter for gradients, where adjustments to momentum coefficients modify the filter characteristics. Our experiments support this perspective and provide a deeper understanding of the mechanism involved. Moreover, our analysis reveals the following significant findings: high-frequency gradient components are undesired in the late stages of training; preserving the original gradient in the early stages, and gradually amplifying low-frequency gradient components during training both enhance performance. Based on these insights, we propose Frequency Stochastic Gradient Descent with Momentum (FSGDM), a heuristic optimizer that dynamically adjusts the momentum filtering characteristic with an empirically effective dynamic magnitude response. Experimental results demonstrate the superiority of FSGDM over conventional momentum optimizers.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes