Learning to Attend On Essential Terms: An Enhanced Retriever-Reader Model for Open-domain Question Answering
This addresses the problem of improving evidence retrieval for complex questions in open-domain QA, representing an incremental advancement over existing methods.
The paper tackles the challenge of retrieving indirectly related evidence for complex open-domain questions by proposing a retriever-reader model that learns to attend on essential terms, achieving state-of-the-art performance on the AI2 Reasoning Challenge (ARC) dataset.
Open-domain question answering remains a challenging task as it requires models that are capable of understanding questions and answers, collecting useful information, and reasoning over evidence. Previous work typically formulates this task as a reading comprehension or entailment problem given evidence retrieved from search engines. However, existing techniques struggle to retrieve indirectly related evidence when no directly related evidence is provided, especially for complex questions where it is hard to parse precisely what the question asks. In this paper we propose a retriever-reader model that learns to attend on essential terms during the question answering process. We build (1) an essential term selector which first identifies the most important words in a question, then reformulates the query and searches for related evidence; and (2) an enhanced reader that distinguishes between essential terms and distracting words to predict the answer. We evaluate our model on multiple open-domain multiple-choice QA datasets, notably performing at the level of the state-of-the-art on the AI2 Reasoning Challenge (ARC) dataset.