CL AIOct 22, 2024

AI-generated Essays: Characteristics and Implications on Automated Scoring and Academic Integrity

Yang Zhong, Jiangang Hao, Michael Fauss, Chen Li, Yuan Wang

arXiv:2410.17439v41.91 citationsh-index: 3Educational Measurement: Issues and Practice

Originality Synthesis-oriented

AI Analysis

This addresses challenges in automated scoring and academic integrity for educational and professional writing assessments, but it is incremental as it builds on existing detection methods.

The study examined AI-generated essays from large language models, finding that existing automated scoring systems like e-rater have limitations when applied to such essays, and showed that detectors trained on one model's essays can identify texts from others with high accuracy.

The rapid advancement of large language models (LLMs) has enabled the generation of coherent essays, making AI-assisted writing increasingly common in educational and professional settings. Using large-scale empirical data, we examine and benchmark the characteristics and quality of essays generated by popular LLMs and discuss their implications for two key components of writing assessments: automated scoring and academic integrity. Our findings highlight limitations in existing automated scoring systems, such as e-rater, when applied to essays generated or heavily influenced by AI, and identify areas for improvement, including the development of new features to capture deeper thinking and recalibrating feature weights. Despite growing concerns that the increasing variety of LLMs may undermine the feasibility of detecting AI-generated essays, our results show that detectors trained on essays generated from one model can often identify texts from others with high accuracy, suggesting that effective detection could remain manageable in practice.

View on arXiv PDF

Similar