CL AIApr 20

Latent Abstraction for Retrieval-Augmented Generation

Ha Lan N. T, Minh-Anh Nguyen, Dung D. Le

arXiv:2604.1786688.0h-index: 4

Predicted impact top 40% in CL · last 90 daysOriginality Highly original

AI Analysis

For practitioners of retrieval-augmented generation, LAnR offers a more efficient and integrated approach that eliminates separate retriever components and explicit stopping logic.

LAnR unifies encoding, retrieval, and generation in a single LLM's latent space, outperforming existing RAG methods on six QA benchmarks while reducing retrieval calls and improving inference efficiency.

Retrieval-Augmented Generation (RAG) has become a standard approach for enhancing large language models (LLMs) with external knowledge, mitigating hallucinations, and improving factuality. However, existing systems rely on generating natural language queries at each hop and maintaining a strict architectural separation between retriever and generator, preventing them from leveraging the full representational capacity of the LLM. We propose \textbf{LAnR} (Latent Abstraction for RAG), a unified framework in which a single LLM jointly performs encoding, retrieval, and generation entirely within its own latent space. Rather than generating textual queries, LAnR produces dense retrieval vectors from the hidden states of a designated \texttt{[PRED]} token and uses them to match against encoded document representations from the same model. Furthermore, LAnR adaptively decides when sufficient evidence has been retrieved using a lightweight MLP control head over those same hidden states, eliminating both the separate retriever and explicit token-level stopping reasoning. This design is motivated by our empirical observation that answer token entropy reliably signals retrieval sufficiency. Extensive experiments on six QA benchmarks spanning single-hop and multi-hop settings demonstrate that LAnR outperforms existing RAG methods, while achieving improved inference efficiency through reduced number of retrieval calls and tighter model integration.

View on arXiv PDF

Similar