LGFeb 15, 2025

Preconditioned Inexact Stochastic ADMM for Deep Model

Shenglong Zhou, Ouya Wang, Ziyan Luo, Yongxu Zhu, Geoffrey Ye Li

arXiv:2502.10784v411.43 citationsh-index: 21Nat Mach Intell

Originality Incremental advance

AI Analysis

This work addresses data heterogeneity challenges in distributed deep learning, offering a scalable optimizer with strong theoretical guarantees, though it appears incremental as it builds on existing ADMM methods.

The paper tackles the problem of slow convergence and data heterogeneity in training foundation models by proposing PISA, a preconditioned inexact stochastic ADMM algorithm, which demonstrates superior performance in experiments across diverse deep models.

The recent advancement of foundation models (FMs) has brought about a paradigm shift, revolutionizing various sectors worldwide. The popular optimizers used to train these models are stochastic gradient descent-based algorithms, which face inherent limitations, such as slow convergence and stringent assumptions for convergence. In particular, data heterogeneity arising from distributed settings poses significant challenges to their theoretical and numerical performance. This paper develops an algorithm, PISA (Preconditioned Inexact Stochastic Alternating Direction Method of Multipliers). Grounded in rigorous theoretical guarantees, the algorithm converges under the sole assumption of Lipschitz continuity of the gradient on a bounded region, thereby removing the need for other conditions commonly imposed by stochastic methods. This capability enables the proposed algorithm to tackle the challenge of data heterogeneity effectively. Moreover, the algorithmic architecture enables scalable parallel computing and supports various preconditions, such as second-order information, second moment, and orthogonalized momentum by Newton-Schulz iterations. Incorporating the latter two preconditions in PISA yields two computationally efficient variants: SISA and NSISA. Comprehensive experimental evaluations for training or fine-tuning diverse deep models, including vision models, large language models, reinforcement learning models, generative adversarial networks, and recurrent neural networks, demonstrate superior numerical performance of SISA and NSISA compared to various state-of-the-art optimizers.

View on arXiv PDF

Similar