IRApr 29

Efficient Listwise Reranking with Compressed Document Representations

arXiv:2604.2648378.7
AI Analysis

For practitioners needing efficient reranking with LLMs, RRK offers a practical speed-quality tradeoff, though it is an incremental improvement over existing compression and listwise methods.

RRK introduces a listwise reranker that compresses documents into fixed-size embeddings, achieving 3x-18x speedup over smaller models while matching or exceeding their effectiveness, especially on long documents.

Reranking, the process of refining the output from a first-stage retriever, is often considered computationally expensive, especially when using Large Language Models (LLMs). A common approach to mitigate this cost involves utilizing smaller LLMs or controlling input length. Inspired by recent advances in document compression for retrieval-augmented generation (RAG), we introduce RRK, an efficient and effective listwise reranker compressing documents into multi-token fixed-size embedding representations. Our simple training via distillation shows that this combination of rich compressed representations and listwise reranking yields a highly efficient and effective system. In particular, our 8B-parameter model runs 3x-18x faster than smaller rerankers (0.6-4B parameters) while matching or outperforming them in effectiveness. The efficiency gains are even more striking on long-document benchmarks, where RRK widens its advantage further.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes