LG AIApr 30

StyleShield: Exposing the Fragility of AIGC Detectors through Continuous Controllable Style Transfer

arXiv:2605.0092463.0

Predicted impact top 45% in LG · last 90 daysOriginality Highly original

AI Analysis

This work demonstrates that current AIGC detectors are fundamentally unreliable for high-stakes applications like academic integrity, as they can be easily evaded with high semantic preservation.

StyleShield achieves 94.6% evasion against the training detector and >=99% against three unseen detectors while maintaining 0.928 semantic similarity, exposing the fragility of AIGC detectors through continuous controllable style transfer.

AI-generated content (AIGC) detectors are increasingly deployed in high-stakes settings such as academic integrity screening, yet their reliability rests on a fundamental paradox: as language models are trained on human-written corpora, the statistical boundary between AI and human writing will inevitably dissolve as models improve. Commercial incentives have further distorted this landscape -- detection services and "de-AIification" tools often operate within the same supply chain, replacing evaluation of content quality with judgment of content origin. We present StyleShield, the first flow matching framework for conditional text style transfer, operating directly in continuous token embedding space via a DiT backbone with zero-initialized cross-attention adapters conditioned on frozen Qwen-7B representations. At inference, we adapt the SDEdit paradigm from image synthesis to text embeddings, with a single parameter gamma providing smooth continuous control over the evasion-preservation trade-off. On a multi-domain Chinese benchmark, StyleShield achieves 94.6% evasion against the training detector and >=99% against three unseen detectors, maintaining 0.928 semantic similarity. We further introduce RateAudit, a document-level scheduling algorithm that demonstrates detection-rate verdicts can be set to arbitrary values, directly questioning the reliability of score-based evaluation.

View on arXiv PDF

Similar