CLAIOct 22, 2025

Style Attack Disguise: When Fonts Become a Camouflage for Adversarial Intent

arXiv:2510.19641v12 citationsh-index: 2
Originality Incremental advance
AI Analysis

This addresses a security problem for NLP and multimodal AI systems by exploiting a human-model perception gap, though it is incremental as it builds on existing adversarial attack methods.

The paper tackles the vulnerability of NLP models to adversarial attacks using stylistic fonts, which humans can read but models misinterpret as distinct tokens, and demonstrates that their proposed Style Attack Disguise (SAD) achieves strong attack performance across various models and tasks.

With social media growth, users employ stylistic fonts and font-like emoji to express individuality, creating visually appealing text that remains human-readable. However, these fonts introduce hidden vulnerabilities in NLP models: while humans easily read stylistic text, models process these characters as distinct tokens, causing interference. We identify this human-model perception gap and propose a style-based attack, Style Attack Disguise (SAD). We design two sizes: light for query efficiency and strong for superior attack performance. Experiments on sentiment classification and machine translation across traditional models, LLMs, and commercial services demonstrate SAD's strong attack performance. We also show SAD's potential threats to multimodal tasks including text-to-image and text-to-speech generation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes