CLLGFeb 3, 2024

Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models

arXiv:2402.02244v3103 citationsh-index: 16IJCAI
Originality Synthesis-oriented
AI Analysis

It tackles the problem of limited sequence length in LLMs for researchers and practitioners, but is incremental as it compiles existing methods rather than introducing new ones.

This survey reviews techniques to extend the context length in large language models, addressing computational and memory limitations to enhance long-context understanding without proportional increases in requirements.

Recently, large language models (LLMs) have shown remarkable capabilities including understanding context, engaging in logical reasoning, and generating responses. However, this is achieved at the expense of stringent computational and memory requirements, hindering their ability to effectively support long input sequences. This survey provides an inclusive review of the recent techniques and methods devised to extend the sequence length in LLMs, thereby enhancing their capacity for long-context understanding. In particular, we review and categorize a wide range of techniques including architectural modifications, such as modified positional encoding and altered attention mechanisms, which are designed to enhance the processing of longer sequences while avoiding a proportional increase in computational requirements. The diverse methodologies investigated in this study can be leveraged across different phases of LLMs, i.e., training, fine-tuning and inference. This enables LLMs to efficiently process extended sequences. The limitations of the current methodologies is discussed in the last section along with the suggestions for future research directions, underscoring the importance of sequence length in the continued advancement of LLMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes