CL AIMay 13, 2025

LCES: Zero-shot Automated Essay Scoring via Pairwise Comparisons Using Large Language Models

arXiv:2505.08498v28 citationsh-index: 2EMNLP

Originality Incremental advance

AI Analysis

This addresses the challenge of reducing cost and effort in essay grading for educational applications, though it is incremental as it builds on existing LLM-based approaches.

The paper tackles the problem of zero-shot automated essay scoring (AES) by proposing LCES, a method that uses large language models (LLMs) for pairwise comparisons instead of direct scoring, which improves accuracy over conventional zero-shot methods as shown in experiments on benchmark datasets.

Recent advances in large language models (LLMs) have enabled zero-shot automated essay scoring (AES), providing a promising way to reduce the cost and effort of essay scoring in comparison with manual grading. However, most existing zero-shot approaches rely on LLMs to directly generate absolute scores, which often diverge from human evaluations owing to model biases and inconsistent scoring. To address these limitations, we propose LLM-based Comparative Essay Scoring (LCES), a method that formulates AES as a pairwise comparison task. Specifically, we instruct LLMs to judge which of two essays is better, collect many such comparisons, and convert them into continuous scores. Considering that the number of possible comparisons grows quadratically with the number of essays, we improve scalability by employing RankNet to efficiently transform LLM preferences into scalar scores. Experiments using AES benchmark datasets show that LCES outperforms conventional zero-shot methods in accuracy while maintaining computational efficiency. Moreover, LCES is robust across different LLM backbones, highlighting its applicability to real-world zero-shot AES.

View on arXiv PDF

Similar