Answering Ambiguous Questions with a Database of Questions, Answers, and Revisions
This work addresses the challenge of handling ambiguous questions in open-domain QA, which is incremental as it builds on existing retrieval methods with a novel database approach.
The paper tackled the problem of answering ambiguous open-domain questions by using a database of unambiguous questions generated from Wikipedia, achieving a 15% relative improvement in recall and 10% in disambiguation on the ASQA benchmark.
Many open-domain questions are under-specified and thus have multiple possible answers, each of which is correct under a different interpretation of the question. Answering such ambiguous questions is challenging, as it requires retrieving and then reasoning about diverse information from multiple passages. We present a new state-of-the-art for answering ambiguous questions that exploits a database of unambiguous questions generated from Wikipedia. On the challenging ASQA benchmark, which requires generating long-form answers that summarize the multiple answers to an ambiguous question, our method improves performance by 15% (relative improvement) on recall measures and 10% on measures which evaluate disambiguating questions from predicted outputs. Retrieving from the database of generated questions also gives large improvements in diverse passage retrieval (by matching user questions q to passages p indirectly, via questions q' generated from p).