CLJul 7, 2024

Rethinking Targeted Adversarial Attacks For Neural Machine Translation

Junjie Wu, Lemao Liu, Wei Bi, Dit-Yan Yeung

arXiv:2407.05319v11.03 citationsh-index: 68Has Code

Originality Incremental advance

AI Analysis

This work addresses robustness evaluation issues in NMT systems, offering a more reliable framework for targeted adversarial attacks, though it is incremental as it refines existing settings rather than introducing a new paradigm.

The paper identifies that existing targeted adversarial attack settings for neural machine translation (NMT) systems overestimate results, and proposes a new setting and a Targeted Word Gradient adversarial Attack (TWGA) method to provide reliable attacks, with experimental results showing effective performance.

Targeted adversarial attacks are widely used to evaluate the robustness of neural machine translation systems. Unfortunately, this paper first identifies a critical issue in the existing settings of NMT targeted adversarial attacks, where their attacking results are largely overestimated. To this end, this paper presents a new setting for NMT targeted adversarial attacks that could lead to reliable attacking results. Under the new setting, it then proposes a Targeted Word Gradient adversarial Attack (TWGA) method to craft adversarial examples. Experimental results demonstrate that our proposed setting could provide faithful attacking results for targeted adversarial attacks on NMT systems, and the proposed TWGA method can effectively attack such victim NMT systems. In-depth analyses on a large-scale dataset further illustrate some valuable findings. 1 Our code and data are available at https://github.com/wujunjie1998/TWGA.

View on arXiv PDF Code

Similar