CLOct 15, 2025

How Sampling Affects the Detectability of Machine-written texts: A Comprehensive Study

arXiv:2510.13681v12 citationsh-index: 5Has CodeEMNLP
Originality Incremental advance
AI Analysis

This work addresses the robustness of text detection systems for AI safety and content moderation, revealing critical vulnerabilities in current methods.

The study investigated how sampling-based decoding strategies affect the detectability of machine-generated texts, finding that minor adjustments to parameters like temperature or top-p can severely impair detector accuracy, with AUROC dropping from near-perfect levels to as low as 1% in some settings.

As texts generated by Large Language Models (LLMs) are ever more common and often indistinguishable from human-written content, research on automatic text detection has attracted growing attention. Many recent detectors report near-perfect accuracy, often boasting AUROC scores above 99\%. However, these claims typically assume fixed generation settings, leaving open the question of how robust such systems are to changes in decoding strategies. In this work, we systematically examine how sampling-based decoding impacts detectability, with a focus on how subtle variations in a model's (sub)word-level distribution affect detection performance. We find that even minor adjustments to decoding parameters - such as temperature, top-p, or nucleus sampling - can severely impair detector accuracy, with AUROC dropping from near-perfect levels to 1\% in some settings. Our findings expose critical blind spots in current detection methods and emphasize the need for more comprehensive evaluation protocols. To facilitate future research, we release a large-scale dataset encompassing 37 decoding configurations, along with our code and evaluation framework https://github.com/BaggerOfWords/Sampling-and-Detection

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes