LGCRDec 14, 2017

DANCin SEQ2SEQ: Fooling Text Classifiers with Adversarial Text Example Generation

arXiv:1712.05419v130 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the vulnerability of text classifiers to adversarial attacks, which is an incremental advance as it builds on existing work but focuses on black-box scenarios with limited prior exploration.

The paper tackled the problem of generating adversarial text examples for black-box text classifiers, introducing DANCin SEQ2SEQ as a GAN-inspired algorithm that recasts the task as a reinforcement learning problem, with results showing preliminary but promising steps towards semantically meaningful examples in real-world attack scenarios.

Machine learning models are powerful but fallible. Generating adversarial examples - inputs deliberately crafted to cause model misclassification or other errors - can yield important insight into model assumptions and vulnerabilities. Despite significant recent work on adversarial example generation targeting image classifiers, relatively little work exists exploring adversarial example generation for text classifiers; additionally, many existing adversarial example generation algorithms require full access to target model parameters, rendering them impractical for many real-world attacks. In this work, we introduce DANCin SEQ2SEQ, a GAN-inspired algorithm for adversarial text example generation targeting largely black-box text classifiers. We recast adversarial text example generation as a reinforcement learning problem, and demonstrate that our algorithm offers preliminary but promising steps towards generating semantically meaningful adversarial text examples in a real-world attack scenario.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes