Multi-Vector Index Compression in Any Modality

Hanxiang Qin, Alexander Martin, Rohan Jha, Chunsheng Zuo, Reno Kriz, Benjamin Van Durme

arXiv:2602.21202v14.23 citationsh-index: 8

Originality Highly original

AI Analysis

This addresses efficiency issues for retrieval in image-, video-, and audio-rich corpora, representing an incremental improvement with novel method elements.

The paper tackles the high computation and storage costs of multi-vector retrieval in late interaction across modalities by introducing query-agnostic compression methods, with attention-guided clustering outperforming others and achieving competitive or improved performance compared to uncompressed indexes on tasks like BEIR, ViDoRe, MSR-VTT, and MultiVENT 2.0.

We study efficient multi-vector retrieval for late interaction in any modality. Late interaction has emerged as a dominant paradigm for information retrieval in text, images, visual documents, and videos, but its computation and storage costs grow linearly with document length, making it costly for image-, video-, and audio-rich corpora. To address this limitation, we explore query-agnostic methods for compressing multi-vector document representations under a constant vector budget. We introduce four approaches for index compression: sequence resizing, memory tokens, hierarchical pooling, and a novel attention-guided clustering (AGC). AGC uses an attention-guided mechanism to identify the most semantically salient regions of a document as cluster centroids and to weight token aggregation. Evaluating these methods on retrieval tasks spanning text (BEIR), visual-document (ViDoRe), and video (MSR-VTT, MultiVENT 2.0), we show that attention-guided clustering consistently outperforms other parameterized compression methods (sequence resizing and memory tokens), provides greater flexibility in index size than non-parametric hierarchical clustering, and achieves competitive or improved performance compared to a full, uncompressed index. The source code is available at: github.com/hanxiangqin/omni-col-press.

View on arXiv PDF

Similar