CLAug 28, 2024

StyleRemix: Interpretable Authorship Obfuscation via Distillation and Perturbation of Style Elements

Jillian Fisher, Skyler Hallinan, Ximing Lu, Mitchell Gordon, Zaid Harchaoui, Yejin Choi

arXiv:2408.15666v115.928 citationsh-index: 33Has Code

Originality Highly original

AI Analysis

This addresses the problem of interpretable and controllable authorship obfuscation for privacy and security applications, representing a novel method for a known bottleneck.

The paper tackles authorship obfuscation by developing StyleRemix, which perturbs fine-grained style elements to obscure author identity, outperforming state-of-the-art baselines and larger LLMs in various domains as shown by automatic and human evaluations.

Authorship obfuscation, rewriting a text to intentionally obscure the identity of the author, is an important but challenging task. Current methods using large language models (LLMs) lack interpretability and controllability, often ignoring author-specific stylistic features, resulting in less robust performance overall. To address this, we develop StyleRemix, an adaptive and interpretable obfuscation method that perturbs specific, fine-grained style elements of the original input text. StyleRemix uses pre-trained Low Rank Adaptation (LoRA) modules to rewrite an input specifically along various stylistic axes (e.g., formality and length) while maintaining low computational cost. StyleRemix outperforms state-of-the-art baselines and much larger LLMs in a variety of domains as assessed by both automatic and human evaluation. Additionally, we release AuthorMix, a large set of 30K high-quality, long-form texts from a diverse set of 14 authors and 4 domains, and DiSC, a parallel corpus of 1,500 texts spanning seven style axes in 16 unique directions

View on arXiv PDF Code

Similar