CLAIFeb 17, 2025

LLMs can Perform Multi-Dimensional Analytic Writing Assessments: A Case Study of L2 Graduate-Level Academic English Writing

arXiv:2502.11368v26 citationsh-index: 9ACL
AI Analysis

This addresses the need for scalable, cost-efficient writing assessment tools for educators and institutions working with L2 graduate students, though it is incremental in applying existing LLMs to a specific domain.

The paper investigated whether large language models (LLMs) can perform multi-dimensional analytic writing assessments on L2 graduate-level academic English writing, finding they can generate reasonably good and generally reliable scores and comments across 9 criteria compared to human experts.

The paper explores the performance of LLMs in the context of multi-dimensional analytic writing assessments, i.e. their ability to provide both scores and comments based on multiple assessment criteria. Using a corpus of literature reviews written by L2 graduate students and assessed by human experts against 9 analytic criteria, we prompt several popular LLMs to perform the same task under various conditions. To evaluate the quality of feedback comments, we apply a novel feedback comment quality evaluation framework. This framework is interpretable, cost-efficient, scalable, and reproducible, compared to existing methods that rely on manual judgments. We find that LLMs can generate reasonably good and generally reliable multi-dimensional analytic assessments. We release our corpus and code for reproducibility.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes