CL LGAug 1, 2025

DACTYL: Diverse Adversarial Corpus of Texts Yielded from Large Language Models

arXiv:2508.00619v16.72 citationsh-index: 2

Originality Synthesis-oriented

AI Analysis

This addresses vulnerabilities in AI-generated text detection for applications like academic integrity, though it is incremental as it builds on existing detection methods with new data and training approaches.

The paper tackled the problem of AI-generated text detectors failing in real-world settings by introducing DACTYL, a challenging dataset focusing on one-shot/few-shot and domain-specific generated texts, and found that many existing detectors struggle on it, with their DXO-trained classifier outperforming BCE-trained ones by 50.56 macro-F1 points in out-of-distribution scenarios.

Existing AIG (AI-generated) text detectors struggle in real-world settings despite succeeding in internal testing, suggesting that they may not be robust enough. We rigorously examine the machine-learning procedure to build these detectors to address this. Most current AIG text detection datasets focus on zero-shot generations, but little work has been done on few-shot or one-shot generations, where LLMs are given human texts as an example. In response, we introduce the Diverse Adversarial Corpus of Texts Yielded from Language models (DACTYL), a challenging AIG text detection dataset focusing on one-shot/few-shot generations. We also include texts from domain-specific continued-pre-trained (CPT) language models, where we fully train all parameters using a memory-efficient optimization approach. Many existing AIG text detectors struggle significantly on our dataset, indicating a potential vulnerability to one-shot/few-shot and CPT-generated texts. We also train our own classifiers using two approaches: standard binary cross-entropy (BCE) optimization and a more recent approach, deep X-risk optimization (DXO). While BCE-trained classifiers marginally outperform DXO classifiers on the DACTYL test set, the latter excels on out-of-distribution (OOD) texts. In our mock deployment scenario in student essay detection with an OOD student essay dataset, the best DXO classifier outscored the best BCE-trained classifier by 50.56 macro-F1 score points at the lowest false positive rates for both. Our results indicate that DXO classifiers generalize better without overfitting to the test set. Our experiments highlight several areas of improvement for AIG text detectors.

View on arXiv PDF

Similar