Exploring the Limitations of Detecting Machine-Generated Text
This work highlights reliability concerns for detection systems used in content moderation and academic integrity, identifying a critical limitation that could undermine their effectiveness in real-world applications.
The paper investigated how stylistic variations and text complexity affect the performance of machine-generated text detectors, finding that classifiers are highly sensitive to these factors and can degrade to random performance, particularly misclassifying easy-to-read texts while performing well on complex ones.
Recent improvements in the quality of the generations by large language models have spurred research into identifying machine-generated text. Such work often presents high-performing detectors. However, humans and machines can produce text in different styles and domains, yet the performance impact of such on machine generated text detection systems remains unclear. In this paper, we audit the classification performance for detecting machine-generated text by evaluating on texts with varying writing styles. We find that classifiers are highly sensitive to stylistic changes and differences in text complexity, and in some cases degrade entirely to random classifiers. We further find that detection systems are particularly susceptible to misclassify easy-to-read texts while they have high performance for complex texts, leading to concerns about the reliability of detection systems. We recommend that future work attends to stylistic factors and reading difficulty levels of human-written and machine-generated text.