CLAIJul 24, 2025

Enhancing RAG Efficiency with Adaptive Context Compression

arXiv:2507.22931v35 citationsh-index: 2EMNLP
Originality Incremental advance
AI Analysis

This work addresses efficiency issues in RAG systems for users of large language models, offering a domain-specific improvement that is incremental over existing compression methods.

The paper tackles the problem of high inference costs in retrieval-augmented generation (RAG) due to lengthy contexts by proposing Adaptive Context Compression for RAG (ACC-RAG), which dynamically adjusts compression rates based on input complexity, resulting in over 4 times faster inference while maintaining or improving accuracy on Wikipedia and five QA datasets.

Retrieval-augmented generation (RAG) enhances large language models (LLMs) with external knowledge but incurs significant inference costs due to lengthy retrieved contexts. While context compression mitigates this issue, existing methods apply fixed compression rates, over-compressing simple queries or under-compressing complex ones. We propose Adaptive Context Compression for RAG (ACC-RAG), a framework that dynamically adjusts compression rates based on input complexity, optimizing inference efficiency without sacrificing accuracy. ACC-RAG combines a hierarchical compressor (for multi-granular embeddings) with a context selector to retain minimal sufficient information, akin to human skimming. Evaluated on Wikipedia and five QA datasets, ACC-RAG outperforms fixed-rate methods and matches/unlocks over 4 times faster inference versus standard RAG while maintaining or improving accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes