CLAIJan 3, 2023

PIE-QG: Paraphrased Information Extraction for Unsupervised Question Generation from Small Corpora

arXiv:2301.01064v1290 citationsh-index: 29
Originality Incremental advance
AI Analysis

This addresses the need for unsupervised QA systems that reduce reliance on large labeled datasets, though it is incremental as it builds on existing OpenIE and BERT methods.

The paper tackles the problem of training question answering systems without labeled data by generating synthetic question-answer pairs from paraphrased passages using Open Information Extraction, achieving performance comparable to state-of-the-art systems while using significantly fewer documents and no external data.

Supervised Question Answering systems (QA systems) rely on domain-specific human-labeled data for training. Unsupervised QA systems generate their own question-answer training pairs, typically using secondary knowledge sources to achieve this outcome. Our approach (called PIE-QG) uses Open Information Extraction (OpenIE) to generate synthetic training questions from paraphrased passages and uses the question-answer pairs as training data for a language model for a state-of-the-art QA system based on BERT. Triples in the form of <subject, predicate, object> are extracted from each passage, and questions are formed with subjects (or objects) and predicates while objects (or subjects) are considered as answers. Experimenting on five extractive QA datasets demonstrates that our technique achieves on-par performance with existing state-of-the-art QA systems with the benefit of being trained on an order of magnitude fewer documents and without any recourse to external reference data sources.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes