CLAILGMay 29, 2025

Enhancing Marker Scoring Accuracy through Ordinal Confidence Modelling in Educational Assessments

arXiv:2505.23315v13 citationsh-index: 6ACL
Originality Incremental advance
AI Analysis

This work addresses ethical concerns in educational assessments by improving score reliability for automated systems, though it is incremental as it builds on existing AES methods.

The study tackled the problem of ensuring high reliability in Automated Essay Scoring (AES) by developing a confidence estimation model that predicts whether AES-generated scores correctly place candidates in the appropriate CEFR level, achieving an F1 score of 0.97 and enabling the release of 47% of scores with 100% CEFR agreement and 99% with at least 95% agreement.

A key ethical challenge in Automated Essay Scoring (AES) is ensuring that scores are only released when they meet high reliability standards. Confidence modelling addresses this by assigning a reliability estimate measure, in the form of a confidence score, to each automated score. In this study, we frame confidence estimation as a classification task: predicting whether an AES-generated score correctly places a candidate in the appropriate CEFR level. While this is a binary decision, we leverage the inherent granularity of the scoring domain in two ways. First, we reformulate the task as an n-ary classification problem using score binning. Second, we introduce a set of novel Kernel Weighted Ordinal Categorical Cross Entropy (KWOCCE) loss functions that incorporate the ordinal structure of CEFR labels. Our best-performing model achieves an F1 score of 0.97, and enables the system to release 47% of scores with 100% CEFR agreement and 99% with at least 95% CEFR agreement -compared to approximately 92% (approx.) CEFR agreement from the standalone AES model where we release all AM predicted scores.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes