CLMay 6, 2022

A Data Cartography based MixUp for Pre-trained Language Models

arXiv:2205.03403v131.8631 citationsh-index: 16Has Code

Originality Incremental advance

AI Analysis

This work addresses data augmentation inefficiencies for NLP practitioners, offering an incremental improvement over existing MixUp methods.

The paper tackles the problem of suboptimal random pair selection in MixUp data augmentation by proposing TDMixUp, which uses training dynamics to combine informative samples, resulting in competitive performance with less training data and lower calibration error on BERT across NLP tasks.

MixUp is a data augmentation strategy where additional samples are generated during training by combining random pairs of training samples and their labels. However, selecting random pairs is not potentially an optimal choice. In this work, we propose TDMixUp, a novel MixUp strategy that leverages Training Dynamics and allows more informative samples to be combined for generating new data samples. Our proposed TDMixUp first measures confidence, variability, (Swayamdipta et al., 2020), and Area Under the Margin (AUM) (Pleiss et al., 2020) to identify the characteristics of training samples (e.g., as easy-to-learn or ambiguous samples), and then interpolates these characterized samples. We empirically validate that our method not only achieves competitive performance using a smaller subset of the training data compared with strong baselines, but also yields lower expected calibration error on the pre-trained language model, BERT, on both in-domain and out-of-domain settings in a wide range of NLP tasks. We publicly release our code.

View on arXiv PDF Code

Similar