SARA: Selective and Adaptive Retrieval-augmented Generation with Context Compression
This addresses the challenge of efficient and accurate knowledge retrieval for users of large language models, though it is incremental as it builds on existing RAG methods.
The paper tackles the problem of limited context length and redundancy in retrieval-augmented generation (RAG) by proposing SARA, a framework that combines natural-language text snippets with semantic compression vectors, resulting in improvements such as +17.71 in answer relevance, +13.72 in answer correctness, and +15.53 in semantic similarity across multiple datasets and models.
Retrieval-augmented Generation (RAG) extends large language models (LLMs) with external knowledge but faces key challenges: restricted effective context length and redundancy in retrieved documents. Pure compression-based approaches reduce input size but often discard fine-grained details essential for factual accuracy. We propose SARA, a unified RAG framework that balances local precision and global knowledge coverage under tight context budgets. SARA combines natural-language text snippets with semantic compression vectors to jointly enhance context efficiency and answer correctness. It represents contexts at two complementary levels: 1) fine-grained natural-language spans that preserve critical entities and numerical values, and 2) compact, interpretable vectors that summarize high-level semantics. An iterative evidence-selection module employs the compression vectors for dynamic reranking of contexts. Across 9 datasets and 5 open-source LLMs spanning 3 model families (Mistral, Llama, and Gemma), SARA consistently improves answer relevance (+17.71), answer correctness (+13.72), and semantic similarity (+15.53), demonstrating the importance of integrating textual and compressed representations for robust, context-efficient RAG.