IRMar 10

A Voronoi Cell Formulation for Principled Token Pruning in Late-Interaction Retrieval Models

arXiv:2603.09933v261.2h-index: 3
Predicted impact top 54% in IR · last 90 daysOriginality Incremental advance
AI Analysis

This addresses storage efficiency for dense retrieval systems, offering a principled method that is competitive and interpretable, though incremental in improving existing pruning approaches.

The paper tackles the high storage overhead of late-interaction retrieval models like ColBERT by introducing a Voronoi cell-based framework for token pruning, which reduces index size while maintaining retrieval quality.

Late-interaction models like ColBERT offer a competitive performance across various retrieval tasks, but require storing a dense embedding for each document token, leading to a substantial index storage overhead. Past works address this by attempting to prune low-importance token embeddings based on statistical and empirical measures, but they often either lack formal grounding or are ineffective. To address these shortcomings, we introduce a framework grounded in hyperspace geometry and cast token pruning as a Voronoi cell estimation problem in the embedding space. By interpreting each token's influence as a measure of its Voronoi region, our approach enables principled pruning that retains retrieval quality while reducing index size. Through our experiments, we demonstrate that this approach serves not only as a competitive pruning strategy but also as a valuable tool for improving and interpreting token-level behavior within dense retrieval systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes