CLJan 4, 2024

LLaMA Pro: Progressive LLaMA with Block Expansion

Tencent
arXiv:2401.02415v2116 citationsh-index: 44ACL
AI Analysis

This addresses the problem of catastrophic forgetting for developers and researchers working with large language models, though it appears incremental as an extension of existing LLaMA architecture.

The paper tackles catastrophic forgetting in large language models when learning new skills by proposing a post-pretraining method with Transformer block expansion, resulting in LLaMA Pro-8.3B which excels in general tasks, programming, and mathematics while outperforming existing open models in the LLaMA family.

Humans generally acquire new skills without compromising the old; however, the opposite holds for Large Language Models (LLMs), e.g., from LLaMA to CodeLLaMA. To this end, we propose a new post-pretraining method for LLMs with an expansion of Transformer blocks. We tune the expanded blocks using only new corpus, efficiently and effectively improving the model's knowledge without catastrophic forgetting. In this paper, we experiment on the corpus of code and math, yielding LLaMA Pro-8.3B, a versatile foundation model initialized from LLaMA2-7B, excelling in general tasks, programming, and mathematics. LLaMA Pro and its instruction-following counterpart (LLaMA Pro-Instruct) achieve advanced performance among various benchmarks, demonstrating superiority over existing open models in the LLaMA family and the immense potential of reasoning and addressing diverse tasks as an intelligent agent. Our findings provide valuable insights into integrating natural and programming languages, laying a solid foundation for developing advanced language agents that operate effectively in various environments.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes