CL AIDec 6, 2024

A Practical Examination of AI-Generated Text Detectors for Large Language Models

Berkeley

arXiv:2412.05139v410.414 citationsh-index: 24NAACL

Originality Synthesis-oriented

AI Analysis

This work highlights the unreliability of current detectors for preventing misuse of large language models, which is an incremental but important finding for security and content verification.

The paper critically evaluates AI-generated text detectors by testing them on unseen domains, datasets, and models, finding that they perform poorly with TPR@.01 as low as 0% and are easily evaded by adversarial attacks.

The proliferation of large language models has raised growing concerns about their misuse, particularly in cases where AI-generated text is falsely attributed to human authors. Machine-generated content detectors claim to effectively identify such text under various conditions and from any language model. This paper critically evaluates these claims by assessing several popular detectors (RADAR, Wild, T5Sentinel, Fast-DetectGPT, PHD, LogRank, Binoculars) on a range of domains, datasets, and models that these detectors have not previously encountered. We employ various prompting strategies to simulate practical adversarial attacks, demonstrating that even moderate efforts can significantly evade detection. We emphasize the importance of the true positive rate at a specific false positive rate (TPR@FPR) metric and demonstrate that these detectors perform poorly in certain settings, with TPR@.01 as low as 0%. Our findings suggest that both trained and zero-shot detectors struggle to maintain high sensitivity while achieving a reasonable true positive rate.

View on arXiv PDF

Similar