CLAIIRLGMar 11, 2022

LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval

arXiv:2203.06169v2660 citationsh-index: 72
AI Analysis

This addresses the need for effective text retrieval systems that don't require labeled training data, representing a significant advance rather than an incremental improvement.

The authors tackled the problem of zero-shot text retrieval without supervised training data by proposing LaPraDoR, an unsupervised pretrained dense retriever that achieves state-of-the-art performance on the BEIR benchmark across 18 datasets, with their lexicon-enhanced approach running 22.5x faster than re-ranking methods.

In this paper, we propose LaPraDoR, a pretrained dual-tower dense retriever that does not require any supervised data for training. Specifically, we first present Iterative Contrastive Learning (ICoL) that iteratively trains the query and document encoders with a cache mechanism. ICoL not only enlarges the number of negative instances but also keeps representations of cached examples in the same hidden space. We then propose Lexicon-Enhanced Dense Retrieval (LEDR) as a simple yet effective way to enhance dense retrieval with lexical matching. We evaluate LaPraDoR on the recently proposed BEIR benchmark, including 18 datasets of 9 zero-shot text retrieval tasks. Experimental results show that LaPraDoR achieves state-of-the-art performance compared with supervised dense retrieval models, and further analysis reveals the effectiveness of our training strategy and objectives. Compared to re-ranking, our lexicon-enhanced approach can be run in milliseconds (22.5x faster) while achieving superior performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes