IRAICLSep 23, 2024

Reducing the Footprint of Multi-Vector Retrieval with Minimal Performance Impact via Token Pooling

arXiv:2409.14683v122 citationsh-index: 6
Originality Incremental advance
AI Analysis

This addresses a practical bottleneck for adopting efficient neural IR systems in real-world applications, though it is an incremental improvement on existing methods.

The paper tackles the high storage and memory requirements of multi-vector retrieval methods like ColBERT by introducing a clustering-based token pooling approach, which reduces vector counts by 50% with minimal performance loss and allows further reductions up to 75% with degradation below 5% on most datasets.

Over the last few years, multi-vector retrieval methods, spearheaded by ColBERT, have become an increasingly popular approach to Neural IR. By storing representations at the token level rather than at the document level, these methods have demonstrated very strong retrieval performance, especially in out-of-domain settings. However, the storage and memory requirements necessary to store the large number of associated vectors remain an important drawback, hindering practical adoption. In this paper, we introduce a simple clustering-based token pooling approach to aggressively reduce the number of vectors that need to be stored. This method can reduce the space & memory footprint of ColBERT indexes by 50% with virtually no retrieval performance degradation. This method also allows for further reductions, reducing the vector count by 66%-to-75% , with degradation remaining below 5% on a vast majority of datasets. Importantly, this approach requires no architectural change nor query-time processing, and can be used as a simple drop-in during indexation with any ColBERT-like model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes