CLOct 12, 2020

End-to-End Synthetic Data Generation for Domain Adaptation of Question Answering Systems

arXiv:2010.06028v11017 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of adapting QA systems to new domains, which is incremental as it builds on existing synthetic data generation techniques.

The paper tackles the problem of domain adaptation for question answering systems by proposing an end-to-end synthetic data generation method, resulting in significant improvements that outperform current state-of-the-art methods.

We propose an end-to-end approach for synthetic QA data generation. Our model comprises a single transformer-based encoder-decoder network that is trained end-to-end to generate both answers and questions. In a nutshell, we feed a passage to the encoder and ask the decoder to generate a question and an answer token-by-token. The likelihood produced in the generation process is used as a filtering score, which avoids the need for a separate filtering model. Our generator is trained by fine-tuning a pretrained LM using maximum likelihood estimation. The experimental results indicate significant improvements in the domain adaptation of QA models outperforming current state-of-the-art methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes