CLLGFeb 22, 2021

MixUp Training Leads to Reduced Overfitting and Improved Calibration for the Transformer Architecture

arXiv:2102.11402v114 citations
Originality Incremental advance
AI Analysis

This work addresses overfitting and calibration issues in NLU tasks, offering incremental improvements for transformer-based models.

The study tackled the challenge of applying MixUp data augmentation to natural language understanding by proposing methods at input, manifold, and embedding levels for transformers, resulting in up to 50% reductions in test loss and calibration error.

MixUp is a computer vision data augmentation technique that uses convex interpolations of input data and their labels to enhance model generalization during training. However, the application of MixUp to the natural language understanding (NLU) domain has been limited, due to the difficulty of interpolating text directly in the input space. In this study, we propose MixUp methods at the Input, Manifold, and sentence embedding levels for the transformer architecture, and apply them to finetune the BERT model for a diverse set of NLU tasks. We find that MixUp can improve model performance, as well as reduce test loss and model calibration error by up to 50%.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes