CLJul 12, 2024

CompAct: Compressing Retrieved Documents Actively for Question Answering

arXiv:2407.09014v358 citationsh-index: 9
Originality Incremental advance
AI Analysis

This addresses the challenge of information overload in retrieval-augmented generation for question answering, offering a cost-efficient solution, though it is incremental as it builds on existing context compression methods.

The paper tackles the problem of language models struggling with extensive retrieved documents in question answering by introducing CompAct, a framework that actively compresses documents, achieving a 47x compression rate and significant performance improvements on multi-hop QA benchmarks.

Retrieval-augmented generation supports language models to strengthen their factual groundings by providing external contexts. However, language models often face challenges when given extensive information, diminishing their effectiveness in solving questions. Context compression tackles this issue by filtering out irrelevant information, but current methods still struggle in realistic scenarios where crucial information cannot be captured with a single-step approach. To overcome this limitation, we introduce CompAct, a novel framework that employs an active strategy to condense extensive documents without losing key information. Our experiments demonstrate that CompAct brings significant improvements in both performance and compression rate on multi-hop question-answering benchmarks. CompAct flexibly operates as a cost-efficient plug-in module with various off-the-shelf retrievers or readers, achieving exceptionally high compression rates (47x).

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes