CLMar 10, 2025

Detection Avoidance Techniques for Large Language Models

Sinclair Schneider, Florian Steuber, Joao A. G. Schneider, Gabi Dreo Rodosek

arXiv:2503.07595v12.71 citationsh-index: 3Data & Policy

Originality Incremental advance

AI Analysis

This work addresses vulnerabilities in AI text detection systems, which is crucial for mitigating risks like fake news spread, though it is incremental as it builds on existing evasion methods.

The paper tackled the problem of detecting AI-generated text by evaluating evasion techniques against classifiers like DetectGPT, finding that rephrasing achieved over 90% evasion while maintaining text similarity.

The increasing popularity of large language models has not only led to widespread use but has also brought various risks, including the potential for systematically spreading fake news. Consequently, the development of classification systems such as DetectGPT has become vital. These detectors are vulnerable to evasion techniques, as demonstrated in an experimental series: Systematic changes of the generative models' temperature proofed shallow learning-detectors to be the least reliable. Fine-tuning the generative model via reinforcement learning circumvented BERT-based-detectors. Finally, rephrasing led to a >90\% evasion of zero-shot-detectors like DetectGPT, although texts stayed highly similar to the original. A comparison with existing work highlights the better performance of the presented methods. Possible implications for society and further research are discussed.

View on arXiv PDF

Similar