CL IRJul 1, 2020

Relevance-guided Supervision for OpenQA with ColBERT

Omar Khattab, Christopher Potts, Matei Zaharia

arXiv:2007.00814v227.6689 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the challenge of handling complex natural language questions in OpenQA systems, representing an incremental improvement with a novel training strategy.

The paper tackled the problem of insufficient expressiveness in learned retrievers for Open-Domain Question Answering by adapting ColBERT to create fine-grained interactions, resulting in state-of-the-art performance on Natural Questions, SQuAD, and TriviaQA.

Systems for Open-Domain Question Answering (OpenQA) generally depend on a retriever for finding candidate passages in a large corpus and a reader for extracting answers from those passages. In much recent work, the retriever is a learned component that uses coarse-grained vector representations of questions and passages. We argue that this modeling choice is insufficiently expressive for dealing with the complexity of natural language questions. To address this, we define ColBERT-QA, which adapts the scalable neural retrieval model ColBERT to OpenQA. ColBERT creates fine-grained interactions between questions and passages. We propose an efficient weak supervision strategy that iteratively uses ColBERT to create its own training data. This greatly improves OpenQA retrieval on Natural Questions, SQuAD, and TriviaQA, and the resulting system attains state-of-the-art extractive OpenQA performance on all three datasets.

View on arXiv PDF Code

Similar