CLApr 4, 2020

BAE: BERT-based Adversarial Examples for Text Classification

arXiv:2004.01970v31121 citations
AI Analysis

This addresses the susceptibility of text classification models to adversarial attacks, offering a more effective and human-like approach for security testing, though it is incremental as it builds on existing adversarial example techniques.

The paper tackles the problem of generating adversarial examples for text classification by introducing BAE, a black box attack using BERT for contextual perturbations, which results in stronger attacks and improved grammaticality and semantic coherence compared to prior rule-based methods.

Modern text classification models are susceptible to adversarial examples, perturbed versions of the original text indiscernible by humans which get misclassified by the model. Recent works in NLP use rule-based synonym replacement strategies to generate adversarial examples. These strategies can lead to out-of-context and unnaturally complex token replacements, which are easily identifiable by humans. We present BAE, a black box attack for generating adversarial examples using contextual perturbations from a BERT masked language model. BAE replaces and inserts tokens in the original text by masking a portion of the text and leveraging the BERT-MLM to generate alternatives for the masked tokens. Through automatic and human evaluations, we show that BAE performs a stronger attack, in addition to generating adversarial examples with improved grammaticality and semantic coherence as compared to prior work.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes