LGJun 14, 2024

Memory-Efficient Optimization with Factorized Hamiltonian Descent

arXiv:2406.09958v312 citations
Originality Incremental advance
AI Analysis

This addresses a critical challenge in training large-scale neural networks by reducing memory usage, though it appears incremental as it builds on existing adaptive optimizer frameworks.

The paper tackles the high memory overhead of adaptive optimizers like Adam in deep learning by introducing H-Fac, a novel optimizer that uses a rank-1 factorization approach to reduce memory costs to sublinear levels while maintaining competitive performance across various architectures.

Modern deep learning heavily depends on adaptive optimizers such as Adam and its variants, which are renowned for their capacity to handle model scaling and streamline hyperparameter tuning. However, these algorithms typically experience high memory overhead caused by the accumulation of optimization states, leading to a critical challenge in training large-scale network models. In this study, we introduce a novel adaptive optimizer, H-Fac, which incorporates a memory-efficient factorization approach to address this challenge. By employing a rank-1 parameterization for both momentum and scaling parameter estimators, H-Fac reduces memory costs to a sublinear level while maintaining competitive performance across a wide range of architectures. We develop our algorithms based on principles derived from Hamiltonian dynamics, providing robust theoretical underpinnings in optimization dynamics and convergence guarantees. These optimization algorithms are designed to be both straightforward and adaptable, facilitating easy implementation in diverse settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes