LG OCFeb 11, 2025

Training Deep Learning Models with Norm-Constrained LMOs

Thomas Pethick, Wanyun Xie, Kimon Antonakopoulos, Zhenyu Zhu, Antonio Silveti-Falls, Volkan Cevher

arXiv:2502.07529v244.9153 citationsh-index: 61Has CodeICML

Originality Incremental advance

AI Analysis

This work addresses optimization challenges in deep learning by introducing a novel method that improves training efficiency and scalability, though it appears incremental as it builds upon existing LMO-based frameworks.

The authors tackled the problem of training deep learning models by proposing a new stochastic optimization algorithm that uses a linear minimization oracle over a norm-ball, which unifies existing methods and enables hyperparameter transfer across model sizes. They demonstrated significant speedups in nanoGPT training with their memory-efficient algorithm, Scion, without relying on Adam.

In this work, we study optimization methods that leverage the linear minimization oracle (LMO) over a norm-ball. We propose a new stochastic family of algorithms that uses the LMO to adapt to the geometry of the problem and, perhaps surprisingly, show that they can be applied to unconstrained problems. The resulting update rule unifies several existing optimization methods under a single framework. Furthermore, we propose an explicit choice of norm for deep architectures, which, as a side benefit, leads to the transferability of hyperparameters across model sizes. Experimentally, we demonstrate significant speedups on nanoGPT training using our algorithm, Scion, without any reliance on Adam. The proposed method is memory-efficient, requiring only one set of model weights and one set of gradients, which can be stored in half-precision. The code is available at https://github.com/LIONS-EPFL/scion .

View on arXiv PDF Code

Similar