UmBERTo-MTSA @ AcCompl-It: Improving Complexity and Acceptability Prediction with Multi-task Learning on Self-Supervised Annotations
This work addresses data scarcity in language modeling for a specific shared task, but it is incremental as it builds on existing self-supervised and multi-task learning methods.
The paper tackled the problem of limited labeled data by proposing a self-supervised data augmentation approach using multi-task learning, which improved prediction quality in the AcCompl-it shared task at EVALITA 2020, though no concrete numbers were provided.
This work describes a self-supervised data augmentation approach used to improve learning models' performances when only a moderate amount of labeled data is available. Multiple copies of the original model are initially trained on the downstream task. Their predictions are then used to annotate a large set of unlabeled examples. Finally, multi-task training is performed on the parallel annotations of the resulting training set, and final scores are obtained by averaging annotator-specific head predictions. Neural language models are fine-tuned using this procedure in the context of the AcCompl-it shared task at EVALITA 2020, obtaining considerable improvements in prediction quality.