CLJun 11, 2024

An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding

arXiv:2406.07138v29.113 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficiently scaling LLMs to handle longer contexts, which is crucial for applications like document analysis and long-form generation, though it is incremental in improving existing positional encoding methods.

The paper tackles the problem of extending the context length of pre-trained large language models (LLMs) beyond 4K tokens while addressing the 'Lost-in-the-Middle' issue, achieving successful extension to 256K tokens with 'Never Miss A Beat' performance on Llama2-7B models.

Recently, many methods have been developed to extend the context length of pre-trained large language models (LLMs), but they often require fine-tuning at the target length ($\gg4K$) and struggle to effectively utilize information from the middle part of the context. To address these issues, we propose $\textbf{C}$ontinuity-$\textbf{R}$elativity ind$\textbf{E}$xing with g$\textbf{A}$ussian $\textbf{M}$iddle ($\texttt{CREAM}$), which interpolates positional encodings by manipulating position indices. Apart from being simple, $\texttt{CREAM}$ is training-efficient: it only requires fine-tuning at the pre-trained context window (e.g., Llama 2-4K) and can extend LLMs to a much longer target context length (e.g., 256K). To ensure that the model focuses more on the information in the middle, we introduce a truncated Gaussian to encourage sampling from the middle part of the context during fine-tuning, thus alleviating the "Lost-in-the-Middle" problem faced by long-context LLMs. Experimental results show that $\texttt{CREAM}$ successfully extends LLMs to the target length for both Base and Chat versions of $\texttt{Llama2-7B}$ with "Never Miss A Beat". Our code is publicly available at https://github.com/bigai-nlco/cream.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes