Effective sampling for large-scale automated writing evaluation systems
This work addresses cost reduction for large-scale AWE adoption, but it is incremental as it builds on existing sampling methods.
The paper tackles the problem of high human-scoring costs in training automated writing evaluation (AWE) models by evaluating algorithms to select the most informative essays for training. The results show how to minimize training set sizes while maximizing predictive performance, reducing costs without unduly sacrificing accuracy.
Automated writing evaluation (AWE) has been shown to be an effective mechanism for quickly providing feedback to students. It has already seen wide adoption in enterprise-scale applications and is starting to be adopted in large-scale contexts. Training an AWE model has historically required a single batch of several hundred writing examples and human scores for each of them. This requirement limits large-scale adoption of AWE since human-scoring essays is costly. Here we evaluate algorithms for ensuring that AWE models are consistently trained using the most informative essays. Our results show how to minimize training set sizes while maximizing predictive performance, thereby reducing cost without unduly sacrificing accuracy. We conclude with a discussion of how to integrate this approach into large-scale AWE systems.