CLMar 17, 2025

Feature Extraction and Analysis for GPT-Generated Text

arXiv:2503.13687v12 citationsh-index: 2Adv Artif Intell Mach Learn
Originality Incremental advance
AI Analysis

This addresses the growing issue of plagiarism and authenticity in academia and other domains, but it is incremental as it builds on existing detection methods.

The paper tackles the problem of distinguishing human-written from GPT-generated text by extracting and analyzing features, showing that with sufficiently long text, they can be differentiated with high accuracy.

With the rise of advanced natural language models like GPT, distinguishing between human-written and GPT-generated text has become increasingly challenging and crucial across various domains, including academia. The long-standing issue of plagiarism has grown more pressing, now compounded by concerns about the authenticity of information, as it is not always clear whether the presented facts are genuine or fabricated. In this paper, we present a comprehensive study of feature extraction and analysis for differentiating between human-written and GPT-generated text. By applying machine learning classifiers to these extracted features, we evaluate the significance of each feature in detection. Our results demonstrate that human and GPT-generated texts exhibit distinct writing styles, which can be effectively captured by our features. Given sufficiently long text, the two can be differentiated with high accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes