CLAIHCNov 27, 2023

YUAN 2.0: A Large Language Model with Localized Filtering-based Attention

arXiv:2311.15786v49 citationsh-index: 4
Originality Incremental advance
AI Analysis

This work addresses the need for more efficient and capable large language models for tasks like coding and math, though it appears incremental with improvements in attention and training methods.

The authors developed Yuan 2.0, a series of large language models up to 102.6 billion parameters, introducing Localized Filtering-based Attention to incorporate local language dependencies and a data filtering system for high-quality datasets, achieving impressive performance in code generation, math problem-solving, and chatting compared to existing models.

In this work, we develop and release Yuan 2.0, a series of large language models with parameters ranging from 2.1 billion to 102.6 billion. The Localized Filtering-based Attention (LFA) is introduced to incorporate prior knowledge of local dependencies of natural language into Attention. A data filtering and generating system is presented to build pre-training and fine-tuning dataset in high quality. A distributed training method with non-uniform pipeline parallel, data parallel, and optimizer parallel is proposed, which greatly reduces the bandwidth requirements of intra-node communication, and achieves good performance in large-scale distributed training. Yuan 2.0 models display impressive ability in code generation, math problem-solving, and chatting compared with existing models. The latest version of YUAN 2.0, including model weights and source code, is accessible at Github.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes