Regularization via Adaptive Pairwise Label Smoothing
This work addresses the problem of overconfident predictions in deep learning models for researchers and practitioners, offering an incremental improvement over existing label smoothing techniques.
This paper introduces Pairwise Label Smoothing (PLS), a novel regularization technique that smooths labels using pairs of samples. PLS automatically learns the smoothing distribution mass for each input pair, leading to models that produce less confident predictions and achieve up to 30% relative classification error reduction compared to existing Label Smoothing methods.
Label Smoothing (LS) is an effective regularizer to improve the generalization of state-of-the-art deep models. For each training sample the LS strategy smooths the one-hot encoded training signal by distributing its distribution mass over the non ground-truth classes, aiming to penalize the networks from generating overconfident output distributions. This paper introduces a novel label smoothing technique called Pairwise Label Smoothing (PLS). The PLS takes a pair of samples as input. Smoothing with a pair of ground-truth labels enables the PLS to preserve the relative distance between the two truth labels while further soften that between the truth labels and the other targets, resulting in models producing much less confident predictions than the LS strategy. Also, unlike current LS methods, which typically require to find a global smoothing distribution mass through cross-validation search, PLS automatically learns the distribution mass for each input pair during training. We empirically show that PLS significantly outperforms LS and the baseline models, achieving up to 30% of relative classification error reduction. We also visually show that when achieving such accuracy gains the PLS tends to produce very low winning softmax scores.