LG CV MLFeb 20, 2020

Affinity and Diversity: Quantifying Mechanisms of Data Augmentation

Raphael Gontijo-Lopes, Sylvia J. Smullin, Ekin D. Cubuk, Ethan Dyer

arXiv:2002.08973v223.286 citationsh-index: 50

Originality Incremental advance

AI Analysis

This work addresses a fundamental gap in machine learning theory for researchers and practitioners, though it is incremental as it builds on existing heuristic approaches.

The paper tackled the problem of understanding why data augmentation improves model generalization by introducing interpretable measures called Affinity and Diversity, finding that performance is best predicted by optimizing both together rather than individually.

Though data augmentation has become a standard component of deep neural network training, the underlying mechanism behind the effectiveness of these techniques remains poorly understood. In practice, augmentation policies are often chosen using heuristics of either distribution shift or augmentation diversity. Inspired by these, we seek to quantify how data augmentation improves model generalization. To this end, we introduce interpretable and easy-to-compute measures: Affinity and Diversity. We find that augmentation performance is predicted not by either of these alone but by jointly optimizing the two.

View on arXiv PDF

Similar