LGAIDec 12, 2024

SMMF: Square-Matricized Momentum Factorization for Memory-Efficient Optimization

arXiv:2412.08894v23 citationsh-index: 1AAAI
Originality Incremental advance
AI Analysis

This addresses memory constraints for deep learning practitioners, offering a flexible solution applicable to various model architectures, though it is incremental as it builds on existing memory-efficient optimizers.

The paper tackles the problem of high memory usage in adaptive learning rate optimizers like Adam by proposing SMMF, a memory-efficient optimizer that reduces memory requirements by up to 96% while achieving comparable performance on CNN and Transformer tasks.

We propose SMMF (Square-Matricized Momentum Factorization), a memory-efficient optimizer that reduces the memory requirement of the widely used adaptive learning rate optimizers, such as Adam, by up to 96%. SMMF enables flexible and efficient factorization of an arbitrary rank (shape) of the first and second momentum tensors during optimization, based on the proposed square-matricization and one-time single matrix factorization. From this, it becomes effectively applicable to any rank (shape) of momentum tensors, i.e., bias, matrix, and any rank-d tensors, prevalent in various deep model architectures, such as CNNs (high rank) and Transformers (low rank), in contrast to existing memory-efficient optimizers that applies only to a particular (rank-2) momentum tensor, e.g., linear layers. We conduct a regret bound analysis of SMMF, which shows that it converges similarly to non-memory-efficient adaptive learning rate optimizers, such as AdamNC, providing a theoretical basis for its competitive optimization capability. In our experiment, SMMF takes up to 96% less memory compared to state-of-the-art memory efficient optimizers, e.g., Adafactor, CAME, and SM3, while achieving comparable model performance on various CNN and Transformer tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes