LGJun 5, 2021

k-Mixup Regularization for Deep Learning via Optimal Transport

arXiv:2106.02933v219 citations
Originality Incremental advance
AI Analysis

This work addresses the need for better regularization techniques in deep learning to enhance model performance and robustness, offering an incremental improvement over existing mixup methods.

The paper tackles the problem of improving generalization and robustness in deep learning by extending mixup regularization to k-mixup, which perturbs batches of training points using optimal transport, and shows that it outperforms standard mixup across various architectures and datasets, with gains similar to or larger than those of mixup over standard empirical risk minimization.

Mixup is a popular regularization technique for training deep neural networks that improves generalization and increases robustness to certain distribution shifts. It perturbs input training data in the direction of other randomly-chosen instances in the training set. To better leverage the structure of the data, we extend mixup in a simple, broadly applicable way to \emph{$k$-mixup}, which perturbs $k$-batches of training points in the direction of other $k$-batches. The perturbation is done with displacement interpolation, i.e. interpolation under the Wasserstein metric. We demonstrate theoretically and in simulations that $k$-mixup preserves cluster and manifold structures, and we extend theory studying the efficacy of standard mixup to the $k$-mixup case. Our empirical results show that training with $k$-mixup further improves generalization and robustness across several network architectures and benchmark datasets of differing modalities. For the wide variety of real datasets considered, the performance gains of $k$-mixup over standard mixup are similar to or larger than the gains of mixup itself over standard ERM after hyperparameter optimization. In several instances, in fact, $k$-mixup achieves gains in settings where standard mixup has negligible to zero improvement over ERM.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes