CL AIMay 16, 2024

StyloAI: Distinguishing AI-Generated Content with Stylometric Analysis

arXiv:2405.10129v19.642 citationsh-index: 3AIED Companion

Originality Incremental advance

AI Analysis

This addresses ethical concerns in various sectors by providing a method to detect AI-generated content, though it is incremental as it builds on existing stylometric analysis techniques.

The study tackled the problem of distinguishing AI-generated text from human-authored content by proposing StyloAI, a model using 31 stylometric features with a Random Forest classifier, achieving accuracy rates of 81% and 98% on two datasets.

The emergence of large language models (LLMs) capable of generating realistic texts and images has sparked ethical concerns across various sectors. In response, researchers in academia and industry are actively exploring methods to distinguish AI-generated content from human-authored material. However, a crucial question remains: What are the unique characteristics of AI-generated text? Addressing this gap, this study proposes StyloAI, a data-driven model that uses 31 stylometric features to identify AI-generated texts by applying a Random Forest classifier on two multi-domain datasets. StyloAI achieves accuracy rates of 81% and 98% on the test set of the AuTextification dataset and the Education dataset, respectively. This approach surpasses the performance of existing state-of-the-art models and provides valuable insights into the differences between AI-generated and human-authored texts.

View on arXiv PDF

Similar