CLLGMar 14, 2022

On the Calibration of Pre-trained Language Models using Mixup Guided by Area Under the Margin and Saliency

arXiv:2203.07559v1652 citationsh-index: 16
Originality Incremental advance
AI Analysis

This work addresses model calibration for natural language understanding, which is important for reliable AI applications, but it is incremental as it builds on existing mixup and calibration techniques.

The paper tackles the problem of improving calibration in pre-trained language models for natural language understanding tasks by proposing a novel mixup strategy guided by Area Under the Margin and saliency maps, achieving the lowest expected calibration error compared to baselines while maintaining competitive accuracy.

A well-calibrated neural model produces confidence (probability outputs) closely approximated by the expected accuracy. While prior studies have shown that mixup training as a data augmentation technique can improve model calibration on image classification tasks, little is known about using mixup for model calibration on natural language understanding (NLU) tasks. In this paper, we explore mixup for model calibration on several NLU tasks and propose a novel mixup strategy for pre-trained language models that improves model calibration further. Our proposed mixup is guided by both the Area Under the Margin (AUM) statistic (Pleiss et al., 2020) and the saliency map of each sample (Simonyan et al.,2013). Moreover, we combine our mixup strategy with model miscalibration correction techniques (i.e., label smoothing and temperature scaling) and provide detailed analyses of their impact on our proposed mixup. We focus on systematically designing experiments on three NLU tasks: natural language inference, paraphrase detection, and commonsense reasoning. Our method achieves the lowest expected calibration error compared to strong baselines on both in-domain and out-of-domain test samples while maintaining competitive accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes