CLSep 19, 2023

Baichuan 2: Open Large-scale Language Models

Peking U
arXiv:2309.10305v4992 citationsh-index: 147Has Code
Originality Incremental advance
AI Analysis

This provides open-source, multilingual LLMs to benefit the research community, though it is incremental as it builds on existing model architectures.

The paper tackles the problem of limited open-source and multilingual capabilities in large language models by introducing Baichuan 2, a series of models with 7B and 13B parameters trained on 2.6 trillion tokens, which matches or outperforms similar-sized open-source models on benchmarks like MMLU and excels in domains like medicine and law.

Large language models (LLMs) have demonstrated remarkable performance on a variety of natural language tasks based on just a few examples of natural language instructions, reducing the need for extensive feature engineering. However, most powerful LLMs are closed-source or limited in their capability for languages other than English. In this technical report, we present Baichuan 2, a series of large-scale multilingual language models containing 7 billion and 13 billion parameters, trained from scratch, on 2.6 trillion tokens. Baichuan 2 matches or outperforms other open-source models of similar size on public benchmarks like MMLU, CMMLU, GSM8K, and HumanEval. Furthermore, Baichuan 2 excels in vertical domains such as medicine and law. We will release all pre-training model checkpoints to benefit the research community in better understanding the training dynamics of Baichuan 2.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes