CLAIIROct 11, 2024

Enhancing Long Context Performance in LLMs Through Inner Loop Query Mechanism

arXiv:2410.12859v12 citationsh-index: 4
Originality Incremental advance
AI Analysis

This addresses the challenge of handling complex, long-context queries for LLM users, though it is incremental as it builds on existing RAG methods.

The paper tackles the problem of limited long-context performance in LLMs due to quadratic scaling and inefficient retrieval in RAG systems by introducing an inner-loop query mechanism with short-term memory, achieving improvements in benchmarks like M-NIAH and BABILong.

Transformers have a quadratic scaling of computational complexity with input size, which limits the input context window size of large language models (LLMs) in both training and inference. Meanwhile, retrieval-augmented generation (RAG) besed models can better handle longer contexts by using a retrieval system to filter out unnecessary information. However, most RAG methods only perform retrieval based on the initial query, which may not work well with complex questions that require deeper reasoning. We introduce a novel approach, Inner Loop Memory Augmented Tree Retrieval (ILM-TR), involving inner-loop queries, based not only on the query question itself but also on intermediate findings. At inference time, our model retrieves information from the RAG system, integrating data from lengthy documents at various levels of abstraction. Based on the information retrieved, the LLM generates texts stored in an area named Short-Term Memory (STM) which is then used to formulate the next query. This retrieval process is repeated until the text in STM converged. Our experiments demonstrate that retrieval with STM offers improvements over traditional retrieval-augmented LLMs, particularly in long context tests such as Multi-Needle In A Haystack (M-NIAH) and BABILong.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes