CLFeb 17, 2025

ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability

arXiv:2502.11336v13 citationsh-index: 47
Originality Incremental advance
AI Analysis

This addresses the need for interpretable detection to prevent mistakes like undermining academic integrity, though it is incremental as it builds on existing interpretable methods.

The paper tackles the problem of detecting machine-generated text by introducing ExaGPT, an interpretable approach that identifies text origin by comparing spans with examples from a datastore, achieving up to +40.9 points of accuracy at a false positive rate of 1% in experiments across domains and generators.

Detecting texts generated by Large Language Models (LLMs) could cause grave mistakes due to incorrect decisions, such as undermining student's academic dignity. LLM text detection thus needs to ensure the interpretability of the decision, which can help users judge how reliably correct its prediction is. When humans verify whether a text is human-written or LLM-generated, they intuitively investigate with which of them it shares more similar spans. However, existing interpretable detectors are not aligned with the human decision-making process and fail to offer evidence that users easily understand. To bridge this gap, we introduce ExaGPT, an interpretable detection approach grounded in the human decision-making process for verifying the origin of a text. ExaGPT identifies a text by checking whether it shares more similar spans with human-written vs. with LLM-generated texts from a datastore. This approach can provide similar span examples that contribute to the decision for each span in the text as evidence. Our human evaluation demonstrates that providing similar span examples contributes more effectively to judging the correctness of the decision than existing interpretable methods. Moreover, extensive experiments in four domains and three generators show that ExaGPT massively outperforms prior powerful detectors by up to +40.9 points of accuracy at a false positive rate of 1%.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes