CLMar 24, 2025

Masks and Mimicry: Strategic Obfuscation and Impersonation Attacks on Authorship Verification

Kenneth Alperin, Rohan Leekha, Adaku Uchendu, Trang Nguyen, Srilakshmi Medarametla, Carlos Levya Capote, Seth Aycock, Charlie Dagli

arXiv:2503.19099v112 citationsh-index: 10Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities

Originality Incremental advance

AI Analysis

This addresses security risks in AI-based authorship identification, showing vulnerabilities that could be exploited by malicious actors, though it is incremental in evaluating existing attack methods.

The paper tackles the problem of adversarial attacks on authorship verification models using LLMs, achieving attack success rates of 92% for obfuscation and 78% for impersonation while preserving text semantics.

The increasing use of Artificial Intelligence (AI) technologies, such as Large Language Models (LLMs) has led to nontrivial improvements in various tasks, including accurate authorship identification of documents. However, while LLMs improve such defense techniques, they also simultaneously provide a vehicle for malicious actors to launch new attack vectors. To combat this security risk, we evaluate the adversarial robustness of authorship models (specifically an authorship verification model) to potent LLM-based attacks. These attacks include untargeted methods - \textit{authorship obfuscation} and targeted methods - \textit{authorship impersonation}. For both attacks, the objective is to mask or mimic the writing style of an author while preserving the original texts' semantics, respectively. Thus, we perturb an accurate authorship verification model, and achieve maximum attack success rates of 92\% and 78\% for both obfuscation and impersonation attacks, respectively.

View on arXiv PDF

Similar