CLMar 28, 2025

SKDU at De-Factify 4.0: Natural Language Features for AI-Generated Text-Detection

Shrikant Malviya, Pablo Arnau-González, Miguel Arevalillo-Herráez, Stamos Katsigiannis

arXiv:2503.22338v16.72 citationsh-index: 17Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of AI-generated text detection for applications like misinformation prevention, but it is incremental as it builds on existing feature sets and classifiers.

The paper tackled the problem of distinguishing human-written from AI-generated text by exploring a pipelined approach with feature extraction and classification, finding that NELA features outperformed RAIDAR features in binary and multi-class tasks, with XGBoost achieving high accuracy.

The rapid advancement of large language models (LLMs) has introduced new challenges in distinguishing human-written text from AI-generated content. In this work, we explored a pipelined approach for AI-generated text detection that includes a feature extraction step (i.e. prompt-based rewriting features inspired by RAIDAR and content-based features derived from the NELA toolkit) followed by a classification module. Comprehensive experiments were conducted on the Defactify4.0 dataset, evaluating two tasks: binary classification to differentiate human-written and AI-generated text, and multi-class classification to identify the specific generative model used to generate the input text. Our findings reveal that NELA features significantly outperform RAIDAR features in both tasks, demonstrating their ability to capture nuanced linguistic, stylistic, and content-based differences. Combining RAIDAR and NELA features provided minimal improvement, highlighting the redundancy introduced by less discriminative features. Among the classifiers tested, XGBoost emerged as the most effective, leveraging the rich feature sets to achieve high accuracy and generalisation.

View on arXiv PDF Code

Similar