CLAIOct 23, 2020

Unsupervised Multi-hop Question Answering by Question Generation

arXiv:2010.12623v2742 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the resource-intensive nature of obtaining training data for multi-hop QA, offering a method to reduce reliance on human annotations, though it is incremental as it builds on existing unsupervised and data generation techniques.

The paper tackles the problem of training multi-hop question answering models without human-labeled data by proposing MQA-QG, an unsupervised framework that generates synthetic training data from various sources, achieving 61% and 83% of supervised performance on HybridQA and HotpotQA datasets, respectively.

Obtaining training data for multi-hop question answering (QA) is time-consuming and resource-intensive. We explore the possibility to train a well-performed multi-hop QA model without referencing any human-labeled multi-hop question-answer pairs, i.e., unsupervised multi-hop QA. We propose MQA-QG, an unsupervised framework that can generate human-like multi-hop training data from both homogeneous and heterogeneous data sources. MQA-QG generates questions by first selecting/generating relevant information from each data source and then integrating the multiple information to form a multi-hop question. Using only generated training data, we can train a competent multi-hop QA which achieves 61% and 83% of the supervised learning performance for the HybridQA and the HotpotQA dataset, respectively. We also show that pretraining the QA system with the generated data would greatly reduce the demand for human-annotated training data. Our codes are publicly available at https://github.com/teacherpeterpan/Unsupervised-Multi-hop-QA.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes