CLOct 4, 2020

Tell Me How to Ask Again: Question Data Augmentation with Controllable Rewriting in Continuous Space

arXiv:2010.01475v11001 citations
Originality Incremental advance
AI Analysis

This addresses data scarcity for question-based NLP tasks, offering a novel augmentation approach that is incremental in its application of existing models.

The paper tackles the problem of generating diverse and context-relevant question data for machine reading comprehension and related tasks by proposing CRQDA, a method that rewrites questions in continuous space, resulting in improved performance on benchmarks like SQuAD 2.0 and QNLI.

In this paper, we propose a novel data augmentation method, referred to as Controllable Rewriting based Question Data Augmentation (CRQDA), for machine reading comprehension (MRC), question generation, and question-answering natural language inference tasks. We treat the question data augmentation task as a constrained question rewriting problem to generate context-relevant, high-quality, and diverse question data samples. CRQDA utilizes a Transformer autoencoder to map the original discrete question into a continuous embedding space. It then uses a pre-trained MRC model to revise the question representation iteratively with gradient-based optimization. Finally, the revised question representations are mapped back into the discrete space, which serve as additional question data. Comprehensive experiments on SQuAD 2.0, SQuAD 1.1 question generation, and QNLI tasks demonstrate the effectiveness of CRQDA

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes