CLLGJan 20, 2022

TextHacker: Learning based Hybrid Local Search Algorithm for Text Hard-label Adversarial Attack

arXiv:2201.08193v2292 citations
Originality Highly original
AI Analysis

This addresses a more rigorous and practical setting for adversarial attacks in NLP, enabling deployment in real-world applications where only hard-label access is available.

The paper tackles the problem of generating adversarial text examples when only the prediction label is accessible, proposing TextHacker, which uses a hybrid local search algorithm based on learned word importance to achieve significantly better attack performance and adversary quality compared to existing hard-label attacks.

Existing textual adversarial attacks usually utilize the gradient or prediction confidence to generate adversarial examples, making it hard to be deployed in real-world applications. To this end, we consider a rarely investigated but more rigorous setting, namely hard-label attack, in which the attacker can only access the prediction label. In particular, we find we can learn the importance of different words via the change on prediction label caused by word substitutions on the adversarial examples. Based on this observation, we propose a novel adversarial attack, termed Text Hard-label attacker (TextHacker). TextHacker randomly perturbs lots of words to craft an adversarial example. Then, TextHacker adopts a hybrid local search algorithm with the estimation of word importance from the attack history to minimize the adversarial perturbation. Extensive evaluations for text classification and textual entailment show that TextHacker significantly outperforms existing hard-label attacks regarding the attack performance as well as adversary quality.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes