CLApr 29, 2024

Can GPT-4 do L2 analytic assessment?

Stefano Bannò, Hari Krishna Vydana, Kate M. Knill, Mark J. F. Gales

arXiv:2404.18557v117.336 citationsh-index: 61BEA

Originality Synthesis-oriented

AI Analysis

This addresses the challenge of automating detailed L2 writing assessment for educational applications, but it is incremental as it applies an existing model to a specific domain.

The paper tackled the problem of automated analytic scoring for second language (L2) essays, which lags behind holistic scoring, by using GPT-4 in a zero-shot manner on a public dataset; the result showed significant correlations between predicted analytic scores and features of proficiency components.

Automated essay scoring (AES) to evaluate second language (L2) proficiency has been a firmly established technology used in educational contexts for decades. Although holistic scoring has seen advancements in AES that match or even exceed human performance, analytic scoring still encounters issues as it inherits flaws and shortcomings from the human scoring process. The recent introduction of large language models presents new opportunities for automating the evaluation of specific aspects of L2 writing proficiency. In this paper, we perform a series of experiments using GPT-4 in a zero-shot fashion on a publicly available dataset annotated with holistic scores based on the Common European Framework of Reference and aim to extract detailed information about their underlying analytic components. We observe significant correlations between the automatically predicted analytic scores and multiple features associated with the individual proficiency components.

View on arXiv PDF

Similar