Tele-FLM Technical Report
This provides an efficient, open-sourced solution for scaling LLMs, benefiting researchers and developers in academia and industry by reducing trial-and-error costs and computational resources.
The authors tackled the lack of open-sourced methods for efficiently scaling large language models beyond 50 billion parameters by introducing Tele-FLM, a 52B multilingual model that demonstrates superior multilingual language modeling abilities and is comparable to larger models like Llama2-70B and DeepSeek-67B in evaluations.
Large language models (LLMs) have showcased profound capabilities in language understanding and generation, facilitating a wide array of applications. However, there is a notable paucity of detailed, open-sourced methodologies on efficiently scaling LLMs beyond 50 billion parameters with minimum trial-and-error cost and computational resources. In this report, we introduce Tele-FLM (aka FLM-2), a 52B open-sourced multilingual large language model that features a stable, efficient pre-training paradigm and enhanced factual judgment capabilities. Tele-FLM demonstrates superior multilingual language modeling abilities, measured by BPB on textual corpus. Besides, in both English and Chinese foundation model evaluation, it is comparable to strong open-sourced models that involve larger pre-training FLOPs, such as Llama2-70B and DeepSeek-67B. In addition to the model weights, we share the core designs, engineering practices, and training details, which we expect to benefit both the academic and industrial communities.