DC AI LG PFJun 12, 2024

ProTrain: Efficient LLM Training via Memory-Aware Techniques

Hanmei Yang, Jin Zhou, Yao Fu, Xiaoqun Wang, Ramine Roane, Hui Guan, Tongping Liu

arXiv:2406.08334v14.33 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of inefficient memory management in LLM training for researchers and practitioners, offering an incremental improvement over existing frameworks.

The paper tackles the memory-intensive training of Large Language Models by proposing ProTrain, a system that improves training throughput by 1.43x to 2.71x compared to state-of-the-art methods without compromising accuracy.

It is extremely memory-hungry to train Large Language Models (LLM). To solve this problem, existing work exploits the combination of CPU and GPU for the training process, such as ZeRO-Offload. Such a technique largely democratizes billion-scale model training, making it possible to train with few consumer graphics cards. However, based on our observation, existing frameworks often provide coarse-grained memory management and require experienced experts in configuration tuning, leading to suboptimal hardware utilization and performance. This paper proposes ProTrain, a novel training system that intelligently balances memory usage and performance by coordinating memory, computation, and IO. ProTrain achieves adaptive memory management through Chunk-Based Model State Management and Block-Wise Activation Management, guided by a Memory-Aware Runtime Profiler without user intervention. ProTrain does not change the training algorithm and thus does not compromise accuracy. Experiments show that ProTrain improves training throughput by 1.43$\times$ to 2.71$\times$ compared to the SOTA training systems.

View on arXiv PDF

Similar