CLAILGAug 31, 2023

YaRN: Efficient Context Window Extension of Large Language Models

arXiv:2309.00071v3555 citationsh-index: 5Has Code
Originality Incremental advance
AI Analysis

This addresses a key limitation for users of transformer-based models by enabling efficient handling of longer sequences, though it is an incremental improvement over existing RoPE extension methods.

The paper tackles the problem of extending the context window of large language models beyond their training length, achieving this with YaRN, which requires 10x fewer tokens and 2.5x fewer training steps than previous methods while surpassing state-of-the-art performance.

Rotary Position Embeddings (RoPE) have been shown to effectively encode positional information in transformer-based language models. However, these models fail to generalize past the sequence length they were trained on. We present YaRN (Yet another RoPE extensioN method), a compute-efficient method to extend the context window of such models, requiring 10x less tokens and 2.5x less training steps than previous methods. Using YaRN, we show that LLaMA models can effectively utilize and extrapolate to context lengths much longer than their original pre-training would allow, while also surpassing previous the state-of-the-art at context window extension. In addition, we demonstrate that YaRN exhibits the capability to extrapolate beyond the limited context of a fine-tuning dataset. Code is available at https://github.com/jquesnelle/yarn

Code Implementations10 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes