Open Domain Question Answering with A Unified Knowledge Interface
This work addresses the challenge of integrating diverse knowledge sources for open-domain question answering, which is incremental as it builds on the retriever-reader framework by adding a unified interface.
The paper tackled the problem of accessing heterogeneous knowledge sources in open-domain question answering by proposing a verbalizer-retriever-reader framework that uses data-to-text methods to unify structured data and text, resulting in large gains over text-only baselines and setting a single-model state-of-the-art on Natural Questions.
The retriever-reader framework is popular for open-domain question answering (ODQA) due to its ability to use explicit knowledge. Although prior work has sought to increase the knowledge coverage by incorporating structured knowledge beyond text, accessing heterogeneous knowledge sources through a unified interface remains an open question. While data-to-text generation has the potential to serve as a universal interface for data and text, its feasibility for downstream tasks remains largely unknown. In this work, we bridge this gap and use the data-to-text method as a means for encoding structured knowledge for ODQA. Specifically, we propose a verbalizer-retriever-reader framework for ODQA over data and text where verbalized tables from Wikipedia and graphs from Wikidata are used as augmented knowledge sources. We show that our Unified Data and Text QA, UDT-QA, can effectively benefit from the expanded knowledge index, leading to large gains over text-only baselines. Notably, our approach sets the single-model state-of-the-art on Natural Questions. Furthermore, our analyses indicate that verbalized knowledge is preferred for answer reasoning for both adapted and hot-swap settings.