IRITITApr 20

Context-Aware Search and Retrieval Under Token Erasure

arXiv:2604.1842445.8h-index: 6
Predicted impact top 78% in IR · last 90 daysOriginality Synthesis-oriented
AI Analysis

For developers of retrieval-augmented generation systems, this work provides an information-theoretic analysis and practical principles to improve retrieval reliability under token loss.

The paper analyzes retrieval reliability in RAG-like systems under token erasures, showing that assigning higher redundancy to semantically important query features improves retrieval reliability. Numerical results and data-driven evaluation on real-world data support the analysis.

This paper introduces and analyzes a search and retrieval model for RAG-like systems under {token} erasures. We provide an information-theoretic analysis of remote document retrieval when query representations are only partially preserved. The query is represented using term-frequency-based features, and semantically adaptive redundancy is assigned according to feature importance. Retrieval is performed using TF-IDF-weighted similarity. We characterize the retrieval error probability by showing that the vector of similarity margins converges to a multivariate Gaussian distribution, yielding an explicit approximation and computable upper bounds. Numerical results support the analysis, while a separate data-driven evaluation using embedding-based retrieval on real-world data shows that the same importance-aware redundancy principles extend to modern retrieval pipelines. Overall, the results show that assigning higher redundancy to semantically important query features improves retrieval reliability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes