CELGNAMar 20, 2024

Improving the Adaptive Moment Estimation (ADAM) stochastic optimizer through an Implicit-Explicit (IMEX) time-stepping approach

arXiv:2403.13704v26 citationsh-index: 11
Originality Incremental advance
AI Analysis

This work offers an incremental improvement for machine learning practitioners by enhancing a widely used optimizer.

The authors tackled the problem of improving the Adam optimizer by showing it corresponds to an ODE and proposing higher-order IMEX time-stepping methods, resulting in a new algorithm that outperforms classical Adam on regression and classification tasks.

The Adam optimizer, often used in Machine Learning for neural network training, corresponds to an underlying ordinary differential equation (ODE) in the limit of very small learning rates. This work shows that the classical Adam algorithm is a first-order implicit-explicit (IMEX) Euler discretization of the underlying ODE. Employing the time discretization point of view, we propose new extensions of the Adam scheme obtained by using higher-order IMEX methods to solve the ODE. Based on this approach, we derive a new optimization algorithm for neural network training that performs better than classical Adam on several regression and classification problems.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes