CLCRLGMar 22, 2022

A Girl Has A Name, And It's ... Adversarial Authorship Attribution for Deobfuscation

arXiv:2203.11849v17 citationsh-index: 38
Originality Incremental advance
AI Analysis

This work addresses privacy threats in text obfuscation for users by exposing vulnerabilities in existing methods, though it is incremental as it builds on prior obfuscation and attribution research.

The paper tackles the problem of adversarial authorship attribution for deobfuscation, showing that adversarially trained attributors degrade obfuscation effectiveness from 20-30% to 5-10% and remain more accurate than non-adversarial attributors even with incorrect assumptions.

Recent advances in natural language processing have enabled powerful privacy-invasive authorship attribution. To counter authorship attribution, researchers have proposed a variety of rule-based and learning-based text obfuscation approaches. However, existing authorship obfuscation approaches do not consider the adversarial threat model. Specifically, they are not evaluated against adversarially trained authorship attributors that are aware of potential obfuscation. To fill this gap, we investigate the problem of adversarial authorship attribution for deobfuscation. We show that adversarially trained authorship attributors are able to degrade the effectiveness of existing obfuscators from 20-30% to 5-10%. We also evaluate the effectiveness of adversarial training when the attributor makes incorrect assumptions about whether and which obfuscator was used. While there is a a clear degradation in attribution accuracy, it is noteworthy that this degradation is still at or above the attribution accuracy of the attributor that is not adversarially trained at all. Our results underline the need for stronger obfuscation approaches that are resistant to deobfuscation

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes