Are Synonym Substitution Attacks Really Synonym Substitution Attacks?
This work addresses a critical flaw in adversarial attack methods for NLP, revealing that widely used SSAs often fail to generate valid adversarial samples, which is important for researchers and practitioners in AI security.
The paper investigates whether synonym substitution attacks (SSAs) truly replace words with synonyms, finding that four common methods produce many invalid substitutions that are ungrammatical or alter semantics, and that current constraints for detecting these issues are highly insufficient.
In this paper, we explore the following question: Are synonym substitution attacks really synonym substitution attacks (SSAs)? We approach this question by examining how SSAs replace words in the original sentence and show that there are still unresolved obstacles that make current SSAs generate invalid adversarial samples. We reveal that four widely used word substitution methods generate a large fraction of invalid substitution words that are ungrammatical or do not preserve the original sentence's semantics. Next, we show that the semantic and grammatical constraints used in SSAs for detecting invalid word replacements are highly insufficient in detecting invalid adversarial samples.