CLAISep 7, 2023

FLM-101B: An Open LLM and How to Train It with $100K Budget

TencentTsinghua
arXiv:2309.03852v329 citationsh-index: 63Has Code
Originality Highly original
AI Analysis

This work addresses the financial and environmental costs of LLM pre-training for the AI research community, offering a more efficient approach.

The paper tackles the high computational cost of training large language models (LLMs) by introducing FLM-101B, a model trained with a progressive growth strategy on a $100K budget, achieving 80% of baseline performance with only 10% of the floating-point operations.

Large language models (LLMs) are considered important approaches towards foundational machine intelligence, achieving remarkable success in Natural Language Processing and multimodal tasks, among others. However, the carbon footprints and financial costs originating from heavy pre-training computation is a non-negligible issue. Progressive training methods, inspired by the neurogenesis process that grows neural structures, have shown potential to accelerate LLM pre-training. However, the algorithms, implementation, and practices for progressively training LLMs beyond 100B parameters remain underexplored. In this paper, we show that our model, namely FLM-101B, trained with our growth strategy under a budget of \$100K, reaches 80\% of the baselines' performances with only 10\% of their floating-point operations. We believe that further studies on progressive training will benefit the community by cutting down the costs and promoting green AI. The checkpoint of FLM-101B is released at https://huggingface.co/CofeAI/FLM-101B.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes