CLAIAug 30, 2021

The effects of data size on Automated Essay Scoring engines

arXiv:2108.13275v12 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of optimizing training data for neural networks in production AES systems, which is incremental as it builds on existing practices for feature-based methods.

The study investigated how data size and quality affect Automated Essay Scoring engines using feature-based, recurrent neural network, and transformer-based models, finding that each model type benefits differently from training data characteristics.

We study the effects of data size and quality on the performance on Automated Essay Scoring (AES) engines that are designed in accordance with three different paradigms; A frequency and hand-crafted feature-based model, a recurrent neural network model, and a pretrained transformer-based language model that is fine-tuned for classification. We expect that each type of model benefits from the size and the quality of the training data in very different ways. Standard practices for developing training data for AES engines were established with feature-based methods in mind, however, since neural networks are increasingly being considered in a production setting, this work seeks to inform us as to how to establish better training data for neural networks that will be used in production.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes