CLApr 29, 2020

Benchmarking Robustness of Machine Reading Comprehension Models

arXiv:2004.14004v2722 citationsHas Code
AI Analysis

This addresses the problem of overestimating model performance due to unrobust benchmarks for researchers and practitioners in natural language processing, though it is incremental as it builds on existing benchmarks.

The authors tackled the lack of robustness evaluation in machine reading comprehension models by constructing AdvRACE, a benchmark for testing under adversarial attacks, showing that state-of-the-art models are vulnerable to all tested attacks.

Machine Reading Comprehension (MRC) is an important testbed for evaluating models' natural language understanding (NLU) ability. There has been rapid progress in this area, with new models achieving impressive performance on various benchmarks. However, existing benchmarks only evaluate models on in-domain test sets without considering their robustness under test-time perturbations or adversarial attacks. To fill this important gap, we construct AdvRACE (Adversarial RACE), a new model-agnostic benchmark for evaluating the robustness of MRC models under four different types of adversarial attacks, including our novel distractor extraction and generation attacks. We show that state-of-the-art (SOTA) models are vulnerable to all of these attacks. We conclude that there is substantial room for building more robust MRC models and our benchmark can help motivate and measure progress in this area. We release our data and code at https://github.com/NoviScl/AdvRACE .

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes