CLOct 12, 2022

Improving Question Answering with Generation of NQ-like Questions

arXiv:2210.06599v11 citationsh-index: 8
Originality Synthesis-oriented
AI Analysis

This work addresses data scarcity for QA systems by generating synthetic data, but it is incremental as it adapts existing datasets rather than introducing a new paradigm.

The paper tackles the problem of costly annotated data for Question Answering (QA) systems by proposing an algorithm to automatically generate Natural Questions (NQ)-like questions from Quizbowl (QB) trivia questions, improving QA performance in low-resource settings over baseline systems on both NQ and QB data.

Question Answering (QA) systems require a large amount of annotated data which is costly and time-consuming to gather. Converting datasets of existing QA benchmarks are challenging due to different formats and complexities. To address these issues, we propose an algorithm to automatically generate shorter questions resembling day-to-day human communication in the Natural Questions (NQ) dataset from longer trivia questions in Quizbowl (QB) dataset by leveraging conversion in style among the datasets. This provides an automated way to generate more data for our QA systems. To ensure quality as well as quantity of data, we detect and remove ill-formed questions using a neural classifier. We demonstrate that in a low resource setting, using the generated data improves the QA performance over the baseline system on both NQ and QB data. Our algorithm improves the scalability of training data while maintaining quality of data for QA systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes