IRAICLLGMar 17

OPERA: Online Data Pruning for Efficient Retrieval Model Adaptation

arXiv:2603.1720557.5h-index: 13
Predicted impact top 61% in IR · last 90 daysOriginality Incremental advance
AI Analysis

This addresses efficiency bottlenecks in retrieval model adaptation for practitioners, though it is incremental over existing pruning methods.

The paper tackles the problem of inefficient domain-specific finetuning for dense retrievers by introducing OPERA, a data pruning framework that improves both effectiveness and efficiency. Results show dynamic pruning achieves ranking improvements (NDCG@10 +1.9%) and retrieval gains (Recall@20 +0.7%) while reducing training time by over 50%.

Domain-specific finetuning is essential for dense retrievers, yet not all training pairs contribute equally to the learning process. We introduce OPERA, a data pruning framework that exploits this heterogeneity to improve both the effectiveness and efficiency of retrieval model adaptation. We first investigate static pruning (SP), which retains only high-similarity query-document pairs, revealing an intrinsic quality-coverage tradeoff: ranking (NDCG) improves while retrieval (Recall) can degrade due to reduced query diversity. To resolve this tradeoff, we propose a two-stage dynamic pruning (DP) strategy that adaptively modulates sampling probabilities at both query and document levels throughout training, prioritizing high-quality examples while maintaining access to the full training set. Evaluations across eight datasets spanning six domains demonstrate the effectiveness of both approaches: SP improves ranking over standard finetuning (NDCG@10 +0.5\%), while DP achieves the strongest performance on both ranking (NDCG@10 +1.9\%) and retrieval (Recall@20 +0.7\%), with an average rank of 1.38 across all methods. These findings scale to Qwen3-Embedding, an LLM-based dense retriever, confirming architecture-agnostic benefits. Notably, DP reaches comparable performance in less than 50\% of the training time required by standard finetuning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes