CLAIDec 6, 2024

A Practical Examination of AI-Generated Text Detectors for Large Language Models

Berkeley
arXiv:2412.05139v414 citationsh-index: 24NAACL
Originality Synthesis-oriented
AI Analysis

This work highlights the unreliability of current detectors for preventing misuse of large language models, which is an incremental but important finding for security and content verification.

The paper critically evaluates AI-generated text detectors by testing them on unseen domains, datasets, and models, finding that they perform poorly with TPR@.01 as low as 0% and are easily evaded by adversarial attacks.

The proliferation of large language models has raised growing concerns about their misuse, particularly in cases where AI-generated text is falsely attributed to human authors. Machine-generated content detectors claim to effectively identify such text under various conditions and from any language model. This paper critically evaluates these claims by assessing several popular detectors (RADAR, Wild, T5Sentinel, Fast-DetectGPT, PHD, LogRank, Binoculars) on a range of domains, datasets, and models that these detectors have not previously encountered. We employ various prompting strategies to simulate practical adversarial attacks, demonstrating that even moderate efforts can significantly evade detection. We emphasize the importance of the true positive rate at a specific false positive rate (TPR@FPR) metric and demonstrate that these detectors perform poorly in certain settings, with TPR@.01 as low as 0%. Our findings suggest that both trained and zero-shot detectors struggle to maintain high sensitivity while achieving a reasonable true positive rate.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes