CR CL LGMar 21, 2024

Reversible Jump Attack to Textual Classifiers with Modification Reduction

arXiv:2403.14731v12.31 citationsh-index: 11Has CodeMach learn

Originality Incremental advance

AI Analysis

This addresses vulnerabilities in textual classifiers for NLP applications, representing an incremental improvement over existing adversarial attack techniques.

The paper tackles the problem of generating adversarial examples for NLP models by proposing two algorithms, RJA and MMR, to improve attack effectiveness and imperceptibility, resulting in outperforming state-of-the-art methods in metrics like attack performance and fluency.

Recent studies on adversarial examples expose vulnerabilities of natural language processing (NLP) models. Existing techniques for generating adversarial examples are typically driven by deterministic hierarchical rules that are agnostic to the optimal adversarial examples, a strategy that often results in adversarial samples with a suboptimal balance between magnitudes of changes and attack successes. To this end, in this research we propose two algorithms, Reversible Jump Attack (RJA) and Metropolis-Hasting Modification Reduction (MMR), to generate highly effective adversarial examples and to improve the imperceptibility of the examples, respectively. RJA utilizes a novel randomization mechanism to enlarge the search space and efficiently adapts to a number of perturbed words for adversarial examples. With these generated adversarial examples, MMR applies the Metropolis-Hasting sampler to enhance the imperceptibility of adversarial examples. Extensive experiments demonstrate that RJA-MMR outperforms current state-of-the-art methods in attack performance, imperceptibility, fluency and grammar correctness.

View on arXiv PDF Code

Similar