IRCLLGNIAug 5, 2023

Towards Consistency Filtering-Free Unsupervised Learning for Dense Retrieval

arXiv:2308.02926v1h-index: 14
Originality Synthesis-oriented
AI Analysis

This work addresses efficiency and domain transfer challenges in information retrieval for practitioners, though it appears incremental as it compares existing unsupervised methods rather than introducing fundamentally new techniques.

The paper tackles the computational expense and domain adaptation limitations of consistency filtering in unsupervised dense retrieval by evaluating alternative methods like pseudo-relevance feedback, finding that TextRank-based approaches outperform others while maintaining or improving efficiency and performance.

Domain transfer is a prevalent challenge in modern neural Information Retrieval (IR). To overcome this problem, previous research has utilized domain-specific manual annotations and synthetic data produced by consistency filtering to finetune a general ranker and produce a domain-specific ranker. However, training such consistency filters are computationally expensive, which significantly reduces the model efficiency. In addition, consistency filtering often struggles to identify retrieval intentions and recognize query and corpus distributions in a target domain. In this study, we evaluate a more efficient solution: replacing the consistency filter with either direct pseudo-labeling, pseudo-relevance feedback, or unsupervised keyword generation methods for achieving consistent filtering-free unsupervised dense retrieval. Our extensive experimental evaluations demonstrate that, on average, TextRank-based pseudo relevance feedback outperforms other methods. Furthermore, we analyzed the training and inference efficiency of the proposed paradigm. The results indicate that filtering-free unsupervised learning can continuously improve training and inference efficiency while maintaining retrieval performance. In some cases, it can even improve performance based on particular datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes