CLLGFeb 2

Enhancing Automated Essay Scoring with Three Techniques: Two-Stage Fine-Tuning, Score Alignment, and Self-Training

arXiv:2602.01747v11 citationsh-index: 3
Originality Incremental advance
AI Analysis

This addresses the scalability of essay scoring in education, but it is incremental as it builds on existing methods like DualBERT.

The study tackled the problem of limited labeled data in Automated Essay Scoring by proposing three techniques, achieving 91.2% of full-data performance with only about 1,000 labeled samples in a 32-data setting.

Automated Essay Scoring (AES) plays a crucial role in education by providing scalable and efficient assessment tools. However, in real-world settings, the extreme scarcity of labeled data severely limits the development and practical adoption of robust AES systems. This study proposes a novel approach to enhance AES performance in both limited-data and full-data settings by introducing three key techniques. First, we introduce a Two-Stage fine-tuning strategy that leverages low-rank adaptations to better adapt an AES model to target prompt essays. Second, we introduce a Score Alignment technique to improve consistency between predicted and true score distributions. Third, we employ uncertainty-aware self-training using unlabeled data, effectively expanding the training set with pseudo-labeled samples while mitigating label noise propagation. We implement above three key techniques on DualBERT. We conduct extensive experiments on the ASAP++ dataset. As a result, in the 32-data setting, all three key techniques improve performance, and their integration achieves 91.2% of the full-data performance trained on approximately 1,000 labeled samples. In addition, the proposed Score Alignment technique consistently improves performance in both limited-data and full-data settings: e.g., it achieves state-of-the-art results in the full-data setting when integrated into DualBERT.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes