CLAug 5, 2022

Low-Resource Dense Retrieval for Open-Domain Question Answering: A Comprehensive Survey

arXiv:2208.03197v125 citationsh-index: 26
Originality Synthesis-oriented
AI Analysis

It addresses the scalability problem for researchers and practitioners by providing a structured overview to choose appropriate low-resource techniques, but it is incremental as it synthesizes existing work.

This paper surveys techniques for dense retrieval in open-domain question answering under low-resource scenarios, categorizing methods based on required resources like documents, questions, or question-answer pairs, and outlines future research directions.

Dense retrieval (DR) approaches based on powerful pre-trained language models (PLMs) achieved significant advances and have become a key component for modern open-domain question-answering systems. However, they require large amounts of manual annotations to perform competitively, which is infeasible to scale. To address this, a growing body of research works have recently focused on improving DR performance under low-resource scenarios. These works differ in what resources they require for training and employ a diverse set of techniques. Understanding such differences is crucial for choosing the right technique under a specific low-resource scenario. To facilitate this understanding, we provide a thorough structured overview of mainstream techniques for low-resource DR. Based on their required resources, we divide the techniques into three main categories: (1) only documents are needed; (2) documents and questions are needed; and (3) documents and question-answer pairs are needed. For every technique, we introduce its general-form algorithm, highlight the open issues and pros and cons. Promising directions are outlined for future research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes