CL AIJul 3, 2024

52B to 1T: Lessons Learned via Tele-FLM Series

Xiang Li, Yiqun Yao, Xin Jiang, Xuezhi Fang, Chao Wang, Xinzhang Liu, Zihan Wang, Yu Zhao, Xin Wang, Yuyao Huang, Shuangyong Song, Yongxiang Li

arXiv:2407.02783v17.712 citationsh-index: 63Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of efficiently scaling LLMs for the AI research community, offering incremental insights and resources.

The paper tackles scaling large language models from 52 billion to 1 trillion parameters, finding that 'less is more' for supervised fine-tuning data and providing best practices for progressive growth, with plans to open-source a 1T model checkpoint.

Large Language Models (LLMs) represent a significant stride toward Artificial General Intelligence. As scaling laws underscore the potential of increasing model sizes, the academic community has intensified its investigations into LLMs with capacities exceeding 50 billion parameters. This technical report builds on our prior work with Tele-FLM (also known as FLM-2), a publicly available 52-billion-parameter model. We delve into two primary areas: we first discuss our observation of Supervised Fine-tuning (SFT) on Tele-FLM-52B, which supports the "less is more" approach for SFT data construction; second, we demonstrate our experiments and analyses on the best practices for progressively growing a model from 52 billion to 102 billion, and subsequently to 1 trillion parameters. We will open-source a 1T model checkpoint, namely Tele-FLM-1T, to advance further training and research.

View on arXiv PDF

Similar