Style Attack Disguise: When Fonts Become a Camouflage for Adversarial Intent
This addresses a security problem for NLP and multimodal AI systems by exploiting a human-model perception gap, though it is incremental as it builds on existing adversarial attack methods.
The paper tackles the vulnerability of NLP models to adversarial attacks using stylistic fonts, which humans can read but models misinterpret as distinct tokens, and demonstrates that their proposed Style Attack Disguise (SAD) achieves strong attack performance across various models and tasks.
With social media growth, users employ stylistic fonts and font-like emoji to express individuality, creating visually appealing text that remains human-readable. However, these fonts introduce hidden vulnerabilities in NLP models: while humans easily read stylistic text, models process these characters as distinct tokens, causing interference. We identify this human-model perception gap and propose a style-based attack, Style Attack Disguise (SAD). We design two sizes: light for query efficiency and strong for superior attack performance. Experiments on sentiment classification and machine translation across traditional models, LLMs, and commercial services demonstrate SAD's strong attack performance. We also show SAD's potential threats to multimodal tasks including text-to-image and text-to-speech generation.