CLJan 7, 2024

Empirical Study of Large Language Models as Automated Essay Scoring Tools in English Composition__Taking TOEFL Independent Writing Task for Example

arXiv:2401.03401v117 citationsh-index: 12
AI Analysis

It addresses automated essay scoring for English language testing, but is incremental as it applies an existing model to a new domain with small sample sizes.

This study assessed ChatGPT's ability to automatically score English essays for the TOEFL Independent Writing Task, finding it can provide operational functionality but with a regression effect in results.

Large language models have demonstrated exceptional capabilities in tasks involving natural language generation, reasoning, and comprehension. This study aims to construct prompts and comments grounded in the diverse scoring criteria delineated within the official TOEFL guide. The primary objective is to assess the capabilities and constraints of ChatGPT, a prominent representative of large language models, within the context of automated essay scoring. The prevailing methodologies for automated essay scoring involve the utilization of deep neural networks, statistical machine learning techniques, and fine-tuning pre-trained models. However, these techniques face challenges when applied to different contexts or subjects, primarily due to their substantial data requirements and limited adaptability to small sample sizes. In contrast, this study employs ChatGPT to conduct an automated evaluation of English essays, even with a small sample size, employing an experimental approach. The empirical findings indicate that ChatGPT can provide operational functionality for automated essay scoring, although the results exhibit a regression effect. It is imperative to underscore that the effective design and implementation of ChatGPT prompts necessitate a profound domain expertise and technical proficiency, as these prompts are subject to specific threshold criteria. Keywords: ChatGPT, Automated Essay Scoring, Prompt Learning, TOEFL Independent Writing Task

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes