Clozer: Adaptable Data Augmentation for Cloze-style Reading Comprehension
This addresses data scarcity in machine reading comprehension for researchers and practitioners, but it is incremental as it builds on existing task-adaptive pre-training approaches.
The paper tackles the problem of limited labeled data in cloze-style reading comprehension by proposing Clozer, a sequence-tagging method for task-adaptive pre-training, which significantly outperforms oracle and state-of-the-art methods in improving model performance on multiple-choice tasks.
Task-adaptive pre-training (TAPT) alleviates the lack of labelled data and provides performance lift by adapting unlabelled data to downstream task. Unfortunately, existing adaptations mainly involve deterministic rules that cannot generalize well. Here, we propose Clozer, a sequence-tagging based cloze answer extraction method used in TAPT that is extendable for adaptation on any cloze-style machine reading comprehension (MRC) downstream tasks. We experiment on multiple-choice cloze-style MRC tasks, and show that Clozer performs significantly better compared to the oracle and state-of-the-art in escalating TAPT effectiveness in lifting model performance, and prove that Clozer is able to recognize the gold answers independently of any heuristics.