CL AI IR LGNov 17, 2022

Data-Efficient Autoregressive Document Retrieval for Fact Verification

arXiv:2211.09388v11.68 citationsh-index: 14

Originality Incremental advance

AI Analysis

This addresses the data efficiency problem for researchers and practitioners in NLP by reducing annotation needs for retrieval tasks like fact verification, though it is incremental as it builds on existing autoregressive retrieval methods.

The paper tackles the problem of training autoregressive document retrievers for fact verification without human annotation by introducing a distant-supervision method, achieving competitive R-Precision and Recall in zero-shot settings and showing that with fine-tuning, performance approaches or exceeds full supervision using less than 1/4 of the annotated data.

Document retrieval is a core component of many knowledge-intensive natural language processing task formulations such as fact verification and question answering. Sources of textual knowledge, such as Wikipedia articles, condition the generation of answers from the models. Recent advances in retrieval use sequence-to-sequence models to incrementally predict the title of the appropriate Wikipedia page given a query. However, this method requires supervision in the form of human annotation to label which Wikipedia pages contain appropriate context. This paper introduces a distant-supervision method that does not require any annotation to train autoregressive retrievers that attain competitive R-Precision and Recall in a zero-shot setting. Furthermore we show that with task-specific supervised fine-tuning, autoregressive retrieval performance for two Wikipedia-based fact verification tasks can approach or even exceed full supervision using less than $1/4$ of the annotated data indicating possible directions for data-efficient autoregressive retrieval.

View on arXiv PDF

Similar