IROct 23, 2019

Context-Aware Sentence/Passage Term Importance Estimation For First Stage Retrieval

arXiv:1910.10687v2216 citations
Originality Incremental advance
AI Analysis

This addresses the bottleneck of computational cost in deep neural ranking models for information retrieval practitioners, offering an incremental improvement by enabling deep contextual understanding in early-stage retrieval.

The paper tackles the problem of weak term frequency signals in first-stage retrieval by proposing a Deep Contextualized Term Weighting framework that learns context-aware term weights from BERT representations, which improves retrieval accuracy on four datasets.

Term frequency is a common method for identifying the importance of a term in a query or document. But it is a weak signal, especially when the frequency distribution is flat, such as in long queries or short documents where the text is of sentence/passage-length. This paper proposes a Deep Contextualized Term Weighting framework that learns to map BERT's contextualized text representations to context-aware term weights for sentences and passages. When applied to passages, DeepCT-Index produces term weights that can be stored in an ordinary inverted index for passage retrieval. When applied to query text, DeepCT-Query generates a weighted bag-of-words query. Both types of term weight can be used directly by typical first-stage retrieval algorithms. This is novel because most deep neural network based ranking models have higher computational costs, and thus are restricted to later-stage rankers. Experiments on four datasets demonstrate that DeepCT's deep contextualized text understanding greatly improves the accuracy of first-stage retrieval algorithms.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes