LGMar 2, 2023

Over-training with Mixup May Hurt Generalization

arXiv:2303.01475v116 citationsh-index: 8
AI Analysis

This highlights a potential pitfall in a widely used regularization technique for deep learning practitioners, indicating it is incremental by revealing an overlooked training dynamic.

The paper identifies that Mixup regularization can lead to a U-shaped generalization curve where model performance decays after prolonged training, especially with smaller datasets, due to overfitting to label noise introduced by Mixup.

Mixup, which creates synthetic training instances by linearly interpolating random sample pairs, is a simple and yet effective regularization technique to boost the performance of deep models trained with SGD. In this work, we report a previously unobserved phenomenon in Mixup training: on a number of standard datasets, the performance of Mixup-trained models starts to decay after training for a large number of epochs, giving rise to a U-shaped generalization curve. This behavior is further aggravated when the size of original dataset is reduced. To help understand such a behavior of Mixup, we show theoretically that Mixup training may introduce undesired data-dependent label noises to the synthesized data. Via analyzing a least-square regression problem with a random feature model, we explain why noisy labels may cause the U-shaped curve to occur: Mixup improves generalization through fitting the clean patterns at the early training stage, but as training progresses, Mixup becomes over-fitting to the noise in the synthetic data. Extensive experiments are performed on a variety of benchmark datasets, validating this explanation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes