IRMay 10, 2021

Not All Relevance Scores are Equal: Efficient Uncertainty and Calibration Modeling for Deep Retrieval Models

arXiv:2105.04651v143 citations
AI Analysis

This work addresses the need for uncertainty modeling in retrieval systems to enhance robustness across new distributions and improve downstream tasks, representing an incremental advancement in retrieval methodology.

The paper tackles the problem of retrieval models lacking uncertainty estimates for their relevance scores, proposing an efficient Bayesian framework that captures model belief with minimal computational overhead. The result is significantly improved ranking effectiveness and confidence calibration, demonstrated through risk-aware reranking and reliable downstream task performance.

In any ranking system, the retrieval model outputs a single score for a document based on its belief on how relevant it is to a given search query. While retrieval models have continued to improve with the introduction of increasingly complex architectures, few works have investigated a retrieval model's belief in the score beyond the scope of a single value. We argue that capturing the model's uncertainty with respect to its own scoring of a document is a critical aspect of retrieval that allows for greater use of current models across new document distributions, collections, or even improving effectiveness for down-stream tasks. In this paper, we address this problem via an efficient Bayesian framework for retrieval models which captures the model's belief in the relevance score through a stochastic process while adding only negligible computational overhead. We evaluate this belief via a ranking based calibration metric showing that our approximate Bayesian framework significantly improves a retrieval model's ranking effectiveness through a risk aware reranking as well as its confidence calibration. Lastly, we demonstrate that this additional uncertainty information is actionable and reliable on down-stream tasks represented via cutoff prediction.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes