CLIRApr 14, 2019

Data Augmentation for BERT Fine-Tuning in Open-Domain Question Answering

arXiv:1904.06652v169 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of enhancing question answering systems for researchers and practitioners, but it is incremental as it builds on existing BERT-based methods with data augmentation.

The paper tackles the problem of improving BERT fine-tuning for open-domain question answering by introducing a data augmentation technique using distant supervision with positive and negative examples, and a stage-wise fine-tuning approach, resulting in large gains over previous methods on English QA datasets and establishing new baselines on Chinese QA datasets.

Recently, a simple combination of passage retrieval using off-the-shelf IR techniques and a BERT reader was found to be very effective for question answering directly on Wikipedia, yielding a large improvement over the previous state of the art on a standard benchmark dataset. In this paper, we present a data augmentation technique using distant supervision that exploits positive as well as negative examples. We apply a stage-wise approach to fine tuning BERT on multiple datasets, starting with data that is "furthest" from the test data and ending with the "closest". Experimental results show large gains in effectiveness over previous approaches on English QA datasets, and we establish new baselines on two recent Chinese QA datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes