CL AI IR LGMar 8, 2024

Can't Remember Details in Long Documents? You Need Some R&R

Devanshu Agrawal, Shang Gao, Martin Gajek

arXiv:2403.05004v115.729 citationsh-index: 2Has CodeEMNLP

Originality Incremental advance

AI Analysis

This addresses a specific bottleneck in long-document QA for users of LLMs, offering an incremental improvement over existing methods.

The paper tackles the problem of long-context LLMs missing important information in document-based QA by introducing R&R, a combination of reprompting and in-context retrieval, which boosts QA accuracy by 16 points on average for documents up to 80k tokens.

Long-context large language models (LLMs) hold promise for tasks such as question-answering (QA) over long documents, but they tend to miss important information in the middle of context documents (arXiv:2307.03172v3). Here, we introduce $\textit{R&R}$ -- a combination of two novel prompt-based methods called $\textit{reprompting}$ and $\textit{in-context retrieval}$ (ICR) -- to alleviate this effect in document-based QA. In reprompting, we repeat the prompt instructions periodically throughout the context document to remind the LLM of its original task. In ICR, rather than instructing the LLM to answer the question directly, we instruct it to retrieve the top $k$ passage numbers most relevant to the given question, which are then used as an abbreviated context in a second QA prompt. We test R&R with GPT-4 Turbo and Claude-2.1 on documents up to 80k tokens in length and observe a 16-point boost in QA accuracy on average. Our further analysis suggests that R&R improves performance on long document-based QA because it reduces the distance between relevant context and the instructions. Finally, we show that compared to short-context chunkwise methods, R&R enables the use of larger chunks that cost fewer LLM calls and output tokens, while minimizing the drop in accuracy.

View on arXiv PDF Code

Similar