E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning
This addresses the problem of long-context processing for LLM users in applications like multi-turn dialogues and document summarization, representing a novel method for a known bottleneck.
The paper tackles the challenge of enabling large language models to process long contexts efficiently and effectively, introducing E2LLM, which outperforms 8 state-of-the-art methods in tasks like document summarization and question answering, achieving the best performance on LongBench v2 for models of comparable size.
Processing long contexts is increasingly important for Large Language Models (LLMs) in tasks like multi-turn dialogues, code generation, and document summarization. This paper addresses the challenges of achieving high long-context performance, low computational complexity, and compatibility with pretrained models -- collectively termed the ``impossible triangle''. We introduce E2LLM (Encoder Elongated Large Language Models), a novel approach that effectively navigates this paradox. E2LLM divides long contexts into chunks, compresses each into soft prompts using a pretrained text encoder, and aligns these representations with a decoder-only LLM via an adapter. To enhance the LLM's reasoning with these soft prompts, we employ two training objectives: encoder output reconstruction and long-context instruction fine-tuning. Extensive experiments reveal that E2LLM not only outperforms 8 state-of-the-art (SOTA) methods in effectiveness and efficiency for document summarization and question answering, but also achieves the best performance on LongBench v2 among models of comparable size.