CLDec 21, 2022

Parallel Context Windows for Large Language Models

arXiv:2212.10947v3257 citationsh-index: 72Has Code
Originality Highly original
AI Analysis

This addresses the problem of handling long text sequences for users of off-the-shelf LLMs without requiring retraining, offering a practical solution for applications requiring extended context.

The paper tackles the context window limitation in large language models (LLMs) by introducing Parallel Context Windows (PCW), a method that processes long text by chunking it into windows with restricted attention and reused positional embeddings, showing substantial improvements in tasks like in-context learning, multi-hop questions, and retrieval-augmented QA across models up to 178 billion parameters.

When applied to processing long text, Large Language Models (LLMs) are limited by their context window. Existing efforts to address this limitation involve training specialized architectures, and cannot be easily applied to off-the-shelf LLMs. We present Parallel Context Windows (PCW), a method that alleviates the context window restriction for any off-the-shelf LLM without further training. The key to the approach is to carve a long context into chunks (``windows''), restrict the attention mechanism to apply only within each window, and re-use the positional embeddings across the windows. Our main results test the PCW approach on in-context learning with models that range in size between 750 million and 178 billion parameters, and show substantial improvements for tasks with diverse input and output spaces. We show additional benefits in other settings where long context windows may be beneficial: multi-hop questions and retrieval-augmented question answering with multiple retrieved documents. Our results highlight Parallel Context Windows as a promising method for applying off-the-shelf LLMs in a range of settings that require long text sequences. We make our code publicly available at https://github.com/ai21labs/parallel-context-windows.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes