Efficient Context Selection for Long-Context QA: No Tuning, No Iteration, Just Adaptive-$k$
This addresses the challenge of efficient context selection for open-domain QA, improving accuracy and token efficiency across various models, though it is incremental as it builds on existing retrieval methods.
The paper tackles the problem of selecting the optimal number of passages for retrieval-augmented generation in long-context QA, presenting Adaptive-$k$ retrieval which adaptively chooses context size based on similarity scores without tuning or iteration, resulting in matching or outperforming fixed baselines while using up to 10x fewer tokens and retrieving 70% of relevant passages.
Retrieval-augmented generation (RAG) and long-context language models (LCLMs) both address context limitations of LLMs in open-domain question answering (QA). However, optimal external context to retrieve remains an open problem: fixing the retrieval size risks either wasting tokens or omitting key evidence. Existing adaptive methods like Self-RAG and Self-Route rely on iterative LLM prompting and perform well on factoid QA, but struggle with aggregation QA, where the optimal context size is both unknown and variable. We present Adaptive-$k$ retrieval, a simple and effective single-pass method that adaptively selects the number of passages based on the distribution of the similarity scores between the query and the candidate passages. It does not require model fine-tuning, extra LLM inferences or changes to existing retriever-reader pipelines. On both factoid and aggregation QA benchmarks, Adaptive-$k$ matches or outperforms fixed-$k$ baselines while using up to 10x fewer tokens than full-context input, yet still retrieves 70% of relevant passages. It improves accuracy across five LCLMs and two embedding models, highlighting that dynamically adjusting context size leads to more efficient and accurate QA.