ALL-IN-ONE: Multi-Task Learning BERT models for Evaluating Peer Assessments
This work addresses the need for high-quality peer assessments in academic fields, but it is incremental as it applies existing multi-task learning techniques to a specific domain.
The paper tackles the problem of evaluating peer-review comments by detecting multiple features simultaneously using multi-task learning BERT models, resulting in a 6% F1-score improvement over previous methods and enhanced performance with reduced model size.
Peer assessment has been widely applied across diverse academic fields over the last few decades and has demonstrated its effectiveness. However, the advantages of peer assessment can only be achieved with high-quality peer reviews. Previous studies have found that high-quality review comments usually comprise several features (e.g., contain suggestions, mention problems, use a positive tone). Thus, researchers have attempted to evaluate peer-review comments by detecting different features using various machine learning and deep learning models. However, there is no single study that investigates using a multi-task learning (MTL) model to detect multiple features simultaneously. This paper presents two MTL models for evaluating peer-review comments by leveraging the state-of-the-art pre-trained language representation models BERT and DistilBERT. Our results demonstrate that BERT-based models significantly outperform previous GloVe-based methods by around 6% in F1-score on tasks of detecting a single feature, and MTL further improves performance while reducing model size.