CL LGFeb 3, 2024

Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models

Xindi Wang, Mahsa Salmani, Parsa Omidi, Xiangyu Ren, Mehdi Rezagholizadeh, Armaghan Eshaghi

arXiv:2402.02244v322.1105 citationsh-index: 16IJCAI

Originality Synthesis-oriented

AI Analysis

It tackles the problem of limited sequence length in LLMs for researchers and practitioners, but is incremental as it compiles existing methods rather than introducing new ones.

This survey reviews techniques to extend the context length in large language models, addressing computational and memory limitations to enhance long-context understanding without proportional increases in requirements.

Recently, large language models (LLMs) have shown remarkable capabilities including understanding context, engaging in logical reasoning, and generating responses. However, this is achieved at the expense of stringent computational and memory requirements, hindering their ability to effectively support long input sequences. This survey provides an inclusive review of the recent techniques and methods devised to extend the sequence length in LLMs, thereby enhancing their capacity for long-context understanding. In particular, we review and categorize a wide range of techniques including architectural modifications, such as modified positional encoding and altered attention mechanisms, which are designed to enhance the processing of longer sequences while avoiding a proportional increase in computational requirements. The diverse methodologies investigated in this study can be leveraged across different phases of LLMs, i.e., training, fine-tuning and inference. This enables LLMs to efficiently process extended sequences. The limitations of the current methodologies is discussed in the last section along with the suggestions for future research directions, underscoring the importance of sequence length in the continued advancement of LLMs.

View on arXiv PDF

Similar