AI CLJun 30, 2025

Assessing GPTZero's Accuracy in Identifying AI vs. Human-Written Essays

arXiv:2506.23517v13.3

Originality Synthesis-oriented

AI Analysis

This research addresses the reliability of AI detection tools for educators, highlighting limitations in distinguishing human-authored content.

The study evaluated GPTZero's accuracy in detecting AI-generated essays across different lengths, finding it correctly identified 91-100% of AI-written texts but produced false positives for human-written essays.

As the use of AI tools by students has become more prevalent, instructors have started using AI detection tools like GPTZero and QuillBot to detect AI written text. However, the reliability of these detectors remains uncertain. In our study, we focused mostly on the success rate of GPTZero, the most-used AI detector, in identifying AI-generated texts based on different lengths of randomly submitted essays: short (40-100 word count), medium (100-350 word count), and long (350-800 word count). We gathered a data set consisting of twenty-eight AI-generated papers and fifty human-written papers. With this randomized essay data, papers were individually plugged into GPTZero and measured for percentage of AI generation and confidence. A vast majority of the AI-generated papers were detected accurately (ranging from 91-100% AI believed generation), while the human generated essays fluctuated; there were a handful of false positives. These findings suggest that although GPTZero is effective at detecting purely AI-generated content, its reliability in distinguishing human-authored texts is limited. Educators should therefore exercise caution when relying solely on AI detection tools.

View on arXiv PDF

Similar