CLJul 13, 2020

Generating Fluent Adversarial Examples for Natural Languages

arXiv:2007.06174v11152 citations
AI Analysis

This work addresses the problem of creating effective and fluent adversarial attacks for NLP tasks, which is incremental as it builds on existing gradient-based methods with a sampling approach.

The paper tackles the challenge of generating fluent adversarial examples in NLP by proposing MHA, which uses Metropolis-Hastings sampling guided by gradients, resulting in improved attacking capability on IMDB and SNLI datasets and enhanced robustness through adversarial training.

Efficiently building an adversarial attacker for natural language processing (NLP) tasks is a real challenge. Firstly, as the sentence space is discrete, it is difficult to make small perturbations along the direction of gradients. Secondly, the fluency of the generated examples cannot be guaranteed. In this paper, we propose MHA, which addresses both problems by performing Metropolis-Hastings sampling, whose proposal is designed with the guidance of gradients. Experiments on IMDB and SNLI show that our proposed MHA outperforms the baseline model on attacking capability. Adversarial training with MAH also leads to better robustness and performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes