LGOCNov 24, 2024

Beyond adaptive gradient: Fast-Controlled Minibatch Algorithm for large-scale optimization

arXiv:2411.15795v31 citationsh-index: 21
Originality Incremental advance
AI Analysis

This addresses efficiency and convergence issues in large-scale optimization for deep learning practitioners, though it appears incremental as an enhancement to existing optimization methods.

The paper tackles limitations of adaptive gradient methods in deep learning by introducing F-CMA, a fast-controlled minibatch algorithm with random reshuffling and line-search, which reduces training time by up to 68% and improves accuracy by up to 5% in classification tasks.

Adaptive gradient methods have been increasingly adopted by deep learning community due to their fast convergence and reduced sensitivity to hyper-parameters. However, these methods come with limitations, such as increased memory requirements for elements like moving averages and a poorly understood convergence theory. To overcome these challenges, we introduce F-CMA, a Fast-Controlled Mini-batch Algorithm with a random reshuffling method featuring a sufficient decrease condition and a line-search procedure to ensure loss reduction per epoch, along with its deterministic proof of global convergence to a stationary point. To evaluate the F-CMA, we integrate it into conventional training protocols for classification tasks involving both convolutional neural networks and vision transformer models, allowing for a direct comparison with popular optimizers. Computational tests show significant improvements, including a decrease in the overall training time by up to 68%, an increase in per-epoch efficiency by up to 20%, and in model accuracy by up to 5%.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes