CLLGFeb 23, 2022

Short-answer scoring with ensembles of pretrained language models

arXiv:2202.11558v110 citations
Originality Synthesis-oriented
AI Analysis

This work addresses automated grading for educational applications, but it is incremental as it builds on existing pretrained models and ensemble techniques.

The paper tackled the problem of automated short-answer scoring by investigating ensembles of pretrained transformer-based language models, finding that while larger models alone fell short of state-of-the-art results, certain ensembles achieved state-of-the-art performance but were too large for practical deployment.

We investigate the effectiveness of ensembles of pretrained transformer-based language models on short answer questions using the Kaggle Automated Short Answer Scoring dataset. We fine-tune a collection of popular small, base, and large pretrained transformer-based language models, and train one feature-base model on the dataset with the aim of testing ensembles of these models. We used an early stopping mechanism and hyperparameter optimization in training. We observe that generally that the larger models perform slightly better, however, they still fall short of state-of-the-art results one their own. Once we consider ensembles of models, there are ensembles of a number of large networks that do produce state-of-the-art results, however, these ensembles are too large to realistically be put in a production environment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes