LGCVNov 9, 2023

Embedding Space Interpolation Beyond Mini-Batch, Beyond Pairs and Beyond Examples

arXiv:2311.05538v17 citationsh-index: 43
Originality Incremental advance
AI Analysis

This work addresses data augmentation for machine learning practitioners by advancing interpolation-based methods beyond current constraints, though it is incremental in building upon existing Mixup techniques.

The paper tackles the limitations of Mixup data augmentation by introducing MultiMix and Dense MultiMix, which generate many more interpolated examples beyond mini-batch size and pairs, leading to significant improvements over state-of-the-art methods on four benchmarks, with classes becoming more tightly clustered and uniformly spread in the embedding space.

Mixup refers to interpolation-based data augmentation, originally motivated as a way to go beyond empirical risk minimization (ERM). Its extensions mostly focus on the definition of interpolation and the space (input or feature) where it takes place, while the augmentation process itself is less studied. In most methods, the number of generated examples is limited to the mini-batch size and the number of examples being interpolated is limited to two (pairs), in the input space. We make progress in this direction by introducing MultiMix, which generates an arbitrarily large number of interpolated examples beyond the mini-batch size and interpolates the entire mini-batch in the embedding space. Effectively, we sample on the entire convex hull of the mini-batch rather than along linear segments between pairs of examples. On sequence data, we further extend to Dense MultiMix. We densely interpolate features and target labels at each spatial location and also apply the loss densely. To mitigate the lack of dense labels, we inherit labels from examples and weight interpolation factors by attention as a measure of confidence. Overall, we increase the number of loss terms per mini-batch by orders of magnitude at little additional cost. This is only possible because of interpolating in the embedding space. We empirically show that our solutions yield significant improvement over state-of-the-art mixup methods on four different benchmarks, despite interpolation being only linear. By analyzing the embedding space, we show that the classes are more tightly clustered and uniformly spread over the embedding space, thereby explaining the improved behavior.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes