IRCLCRLGSep 6, 2019

Natural Adversarial Sentence Generation with Gradient-based Perturbation

arXiv:1909.04495v12 citations
Originality Incremental advance
AI Analysis

This addresses robustness testing for text classification models, with incremental improvements in naturalness and black-box attack capability.

The paper tackles the problem of generating natural adversarial sentences to test text classification model robustness, achieving a 20% relative decrease in accuracy and 74% relative increase in error on a sentiment analysis API.

This work proposes a novel algorithm to generate natural language adversarial input for text classification models, in order to investigate the robustness of these models. It involves applying gradient-based perturbation on the sentence embeddings that are used as the features for the classifier, and learning a decoder for generation. We employ this method to a sentiment analysis model and verify its effectiveness in inducing incorrect predictions by the model. We also conduct quantitative and qualitative analysis on these examples and demonstrate that our approach can generate more natural adversaries. In addition, it can be used to successfully perform black-box attacks, which involves attacking other existing models whose parameters are not known. On a public sentiment analysis API, the proposed method introduces a 20% relative decrease in average accuracy and 74% relative increase in absolute error.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes