CRLGApr 26, 2017

Deep Text Classification Can be Fooled

arXiv:1704.08006v20.00450 citations
AI Analysis75

This reveals a critical security flaw in text classification systems, which is incremental as it extends adversarial attack methods from images to text.

The paper tackles the vulnerability of deep neural network-based text classifiers to adversarial attacks by developing a method to craft text adversarial samples that can fool state-of-the-art classifiers, achieving successful perturbations to any desired class without compromising utility or perceptibility.

In this paper, we present an effective method to craft text adversarial samples, revealing one important yet underestimated fact that DNN-based text classifiers are also prone to adversarial sample attack. Specifically, confronted with different adversarial scenarios, the text items that are important for classification are identified by computing the cost gradients of the input (white-box attack) or generating a series of occluded test samples (black-box attack). Based on these items, we design three perturbation strategies, namely insertion, modification, and removal, to generate adversarial samples. The experiment results show that the adversarial samples generated by our method can successfully fool both state-of-the-art character-level and word-level DNN-based text classifiers. The adversarial samples can be perturbed to any desirable classes without compromising their utilities. At the same time, the introduced perturbation is difficult to be perceived.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes