Passage Ranking with Weak Supervision
This addresses the challenge of reducing reliance on labeled data for ranking tasks, which is incremental but impactful for information retrieval applications.
The paper tackles the problem of neural passage ranking by proposing a weak supervision framework that leverages multiple weak signals, achieving state-of-the-art results without ground-truth labels, such as outperforming BM25 baselines and beating previous supervised methods on two benchmark datasets.
In this paper, we propose a \textit{weak supervision} framework for neural ranking tasks based on the data programming paradigm \citep{Ratner2016}, which enables us to leverage multiple weak supervision signals from different sources. Empirically, we consider two sources of weak supervision signals, unsupervised ranking functions and semantic feature similarities. We train a BERT-based passage-ranking model (which achieves new state-of-the-art performances on two benchmark datasets with full supervision) in our weak supervision framework. Without using ground-truth training labels, BERT-PR models outperform BM25 baseline by a large margin on all three datasets and even beat the previous state-of-the-art results with full supervision on two of the datasets.