AdaFRUGAL: Adaptive Memory-Efficient Training with Dynamic Control
This work addresses the problem of high memory and computational costs in LLM training for resource-constrained settings, offering an incremental improvement over existing methods.
The paper tackled the memory-intensive training of Large Language Models by automating hyperparameter tuning in the FRUGAL framework, resulting in AdaFRUGAL, which maintained competitive performance while significantly reducing GPU memory and training time across pre-training and fine-tuning tasks.
Training Large Language Models (LLMs) is highly memory-intensive due to optimizer state overhead. The FRUGAL framework mitigates this with gradient splitting, but its static hyperparameters -- the subspace ratio ($ρ$) and update frequency ($T$) -- require costly manual tuning, limiting adaptability. We present AdaFRUGAL, which automates this process by introducing two dynamic controls: (i) a linear decay for $ρ$ to progressively reduce memory, and (ii) a loss-aware schedule for $T$ to lower computational overhead. Experiments across large-scale pre-training (English C4, Vietnamese VietVault) and fine-tuning (GLUE) demonstrate that AdaFRUGAL achieves a compelling trade-off. It maintains competitive performance against AdamW and static FRUGAL while significantly reducing both GPU memory and training time, offering a more practical, autonomous solution for resource-constrained LLM training.