Unsupervised multiple choices question answering via universal corpus
This addresses the problem of reducing annotation burden for question answering in new domains, though it appears incremental as it builds on existing unsupervised methods.
The paper tackles unsupervised multiple-choice question answering by generating synthetic data from universal domain contexts without manual annotation, using named entities and knowledge graphs to create distractors, and demonstrates effectiveness on multiple datasets.
Unsupervised question answering is a promising yet challenging task, which alleviates the burden of building large-scale annotated data in a new domain. It motivates us to study the unsupervised multiple-choice question answering (MCQA) problem. In this paper, we propose a novel framework designed to generate synthetic MCQA data barely based on contexts from the universal domain without relying on any form of manual annotation. Possible answers are extracted and used to produce related questions, then we leverage both named entities (NE) and knowledge graphs to discover plausible distractors to form complete synthetic samples. Experiments on multiple MCQA datasets demonstrate the effectiveness of our method.