IRMar 10

A Voronoi Cell Formulation for Principled Token Pruning in Late-Interaction Retrieval Models

Yash Kankanampati, Yuxuan Zong, Nadi Tomeh, Benjamin Piwowarski, Joseph Le Roux

arXiv:2603.09933v27.3h-index: 25Has Code

Predicted impact top 54% in IR · last 90 daysOriginality Incremental advance

AI Analysis

This addresses storage efficiency for dense retrieval systems, offering a principled method that is competitive and interpretable, though incremental in improving existing pruning approaches.

The paper tackles the high storage overhead of late-interaction retrieval models like ColBERT by introducing a Voronoi cell-based framework for token pruning, which reduces index size while maintaining retrieval quality.

Late-interaction models like ColBERT offer a competitive performance across various retrieval tasks, but require storing a dense embedding for each document token, leading to a substantial index storage overhead. Past works address this by attempting to prune low-importance token embeddings based on statistical and empirical measures, but they often either lack formal grounding or are ineffective. To address these shortcomings, we introduce a framework grounded in hyperspace geometry and cast token pruning as a Voronoi cell estimation problem in the embedding space. By interpreting each token's influence as a measure of its Voronoi region, our approach enables principled pruning that retains retrieval quality while reducing index size. Through our experiments, we demonstrate that this approach serves not only as a competitive pruning strategy but also as a valuable tool for improving and interpreting token-level behavior within dense retrieval systems.

View on arXiv PDF Code

Similar