IR LGApr 6

Spike Hijacking in Late-Interaction Retrieval

Karthik Suresh, Tushar Vatsa, Tracy King, Asim Kadav, Michael Friedrich

arXiv:2604.0525333.0h-index: 3

AI Analysis

This work addresses a structural issue in retrieval systems for AI/ML practitioners, but it is incremental as it builds on existing pooling methods.

The study tackled the problem of hard maximum similarity (MaxSim) pooling in late-interaction retrieval models, showing that it causes high gradient concentration and degrades performance with increasing document length, as demonstrated in synthetic and real-world benchmarks.

Late-interaction retrieval models rely on hard maximum similarity (MaxSim) to aggregate token-level similarities. Although effective, this winner-take-all pooling rule may structurally bias training dynamics. We provide a mechanistic study of gradient routing and robustness in MaxSim-based retrieval. In a controlled synthetic environment with in-batch contrastive training, we demonstrate that MaxSim induces significantly higher patch-level gradient concentration than smoother alternatives such as Top-k pooling and softmax aggregation. While sparse routing can improve early discrimination, it also increases sensitivity to document length: as the number of document patches grows, MaxSim degrades more sharply than mild smoothing variants. We corroborate these findings on a real-world multi-vector retrieval benchmark, where controlled document-length sweeps reveal similar brittleness under hard max pooling. Together, our results isolate pooling-induced gradient concentration as a structural property of late-interaction retrieval and highlight a sparsity-robustness tradeoff. These findings motivate principled alternatives to hard max pooling in multi-vector retrieval systems.

View on arXiv PDF

Similar