CL AIApr 25, 2024

Tele-FLM Technical Report

Xiang Li, Yiqun Yao, Xin Jiang, Xuezhi Fang, Chao Wang, Xinzhang Liu, Zihan Wang, Yu Zhao, Xin Wang, Yuyao Huang, Shuangyong Song, Yongxiang Li

arXiv:2404.16645v19.111 citationsh-index: 62Has Code

Originality Incremental advance

AI Analysis

This provides an efficient, open-sourced solution for scaling LLMs, benefiting researchers and developers in academia and industry by reducing trial-and-error costs and computational resources.

The authors tackled the lack of open-sourced methods for efficiently scaling large language models beyond 50 billion parameters by introducing Tele-FLM, a 52B multilingual model that demonstrates superior multilingual language modeling abilities and is comparable to larger models like Llama2-70B and DeepSeek-67B in evaluations.

Large language models (LLMs) have showcased profound capabilities in language understanding and generation, facilitating a wide array of applications. However, there is a notable paucity of detailed, open-sourced methodologies on efficiently scaling LLMs beyond 50 billion parameters with minimum trial-and-error cost and computational resources. In this report, we introduce Tele-FLM (aka FLM-2), a 52B open-sourced multilingual large language model that features a stable, efficient pre-training paradigm and enhanced factual judgment capabilities. Tele-FLM demonstrates superior multilingual language modeling abilities, measured by BPB on textual corpus. Besides, in both English and Chinese foundation model evaluation, it is comparable to strong open-sourced models that involve larger pre-training FLOPs, such as Llama2-70B and DeepSeek-67B. In addition to the model weights, we share the core designs, engineering practices, and training details, which we expect to benefit both the academic and industrial communities.

View on arXiv PDF

Similar