CLCRLGJan 22, 2018

Adversarial Texts with Gradient Methods

arXiv:1801.07175v279 citations
AI Analysis

This work addresses the problem of adversarial attacks in natural language processing for researchers and practitioners, representing an incremental adaptation of image-based methods to text.

The authors tackled the problem of generating adversarial texts using gradient-based methods, which are effective for images but challenging for discrete text inputs, by proposing a framework that searches in embedding space and reconstructs texts via nearest neighbor search, achieving high-quality adversarial texts with only a few word changes, such as altering labels with one word in some cases.

Adversarial samples for images have been extensively studied in the literature. Among many of the attacking methods, gradient-based methods are both effective and easy to compute. In this work, we propose a framework to adapt the gradient attacking methods on images to text domain. The main difficulties for generating adversarial texts with gradient methods are i) the input space is discrete, which makes it difficult to accumulate small noise directly in the inputs, and ii) the measurement of the quality of the adversarial texts is difficult. We tackle the first problem by searching for adversarials in the embedding space and then reconstruct the adversarial texts via nearest neighbor search. For the latter problem, we employ the Word Mover's Distance (WMD) to quantify the quality of adversarial texts. Through extensive experiments on three datasets, IMDB movie reviews, Reuters-2 and Reuters-5 newswires, we show that our framework can leverage gradient attacking methods to generate very high-quality adversarial texts that are only a few words different from the original texts. There are many cases where we can change one word to alter the label of the whole piece of text. We successfully incorporate FGM and DeepFool into our framework. In addition, we empirically show that WMD is closely related to the quality of adversarial texts.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes