CLSep 15, 2021

BERT is Robust! A Case Against Synonym-Based Adversarial Examples in Text Classification

Jens Hauser, Zhao Meng, Damián Pascual, Roger Wattenhofer

arXiv:2109.07403v11.815 citations

Originality Incremental advance

AI Analysis

This challenges the perception of BERT's vulnerability in NLP by exposing flaws in adversarial attack methods, which is important for researchers and practitioners focused on model robustness.

The paper investigates word substitution-based adversarial attacks on BERT and finds that 96-99% of these attacks do not preserve semantics, indicating they rely on poor data; by using data augmentation and post-processing, the success rates of state-of-the-art attacks are reduced below 5%, showing BERT is more robust than previously thought.

Deep Neural Networks have taken Natural Language Processing by storm. While this led to incredible improvements across many tasks, it also initiated a new research field, questioning the robustness of these neural networks by attacking them. In this paper, we investigate four word substitution-based attacks on BERT. We combine a human evaluation of individual word substitutions and a probabilistic analysis to show that between 96% and 99% of the analyzed attacks do not preserve semantics, indicating that their success is mainly based on feeding poor data to the model. To further confirm that, we introduce an efficient data augmentation procedure and show that many adversarial examples can be prevented by including data similar to the attacks during training. An additional post-processing step reduces the success rates of state-of-the-art attacks below 5%. Finally, by looking at more reasonable thresholds on constraints for word substitutions, we conclude that BERT is a lot more robust than research on attacks suggests.

View on arXiv PDF

Similar