IRAICLMay 23

Spectral Retrieval: Multi-Scale Sinc Convolution over Token Embeddings for Localized Retrieval in LLM Multi-Agent Systems

arXiv:2605.247645.8
Predicted impact top 86% in IR · last 90 daysOriginality Incremental advance
AI Analysis

For LLM multi-agent systems needing precise, role-specific retrieval, this plug-in re-ranking method offers a significant improvement over standard dense retrieval with no additional training.

Spectral Retrieval addresses the problem of localized relevance in dense retrieval, where mean-pooling dilutes signal from short relevant spans. By applying multi-scale sinc convolution over token embeddings, it achieves Recall@10 of 1.0 on synthetic benchmarks (vs. 0.02 for mean-pooling) and lifts Recall@10 from 0.33 to 0.90 on LIMIT-small without retraining.

[Abridged] - Spectral Retrieval is a plug-in re-ranking stage that interpolates between per-token MaxSim and mean-pool retrieval through a multi-scale sinc convolution over token embeddings. In standard dense retrieval each document is one mean-pooled vector; when relevance localises into a short subspan, the signal averages into noise. Spectral Retrieval reuses per-token embeddings from a late-interaction index and convolves them with a normalised sinc kernel at multiple scales. At L=1 the kernel acts as the identity, recovering per-token MaxSim; as L grows it approaches a uniform filter, recovering mean pooling. The maximum cosine over positions and scales yields a score provably no less informative than either endpoint. On a controlled synthetic benchmark with 1,000 documents and planted single-position spikes, mean-pool retrieval sits at chance (Recall@10 ~ 0.02) regardless of spike strength, while Spectral Retrieval reaches Recall@10 = 1.0 once the planted cosine exceeds the corpus-level token noise floor. On LIMIT-small with a frozen all-mpnet-base-v2 encoder, Spectral Retrieval lifts Recall@10 from 0.33 to 0.90, MRR from 0.22 to 0.79, and strict Success@10 from 0.12 to 0.84, without retraining. The method fits naturally into multi-agent LLM systems, where each agent benefits from a tighter, role-specific retrieval window over a shared corpus.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes