CVMar 16, 2023

Instance-Conditioned GAN Data Augmentation for Representation Learning

arXiv:2303.09677v110 citationsh-index: 46
Originality Incremental advance
AI Analysis

This work provides an incremental improvement for researchers and practitioners in computer vision by offering a learnable data augmentation tool that can be integrated into existing training pipelines.

The paper tackles the problem of handcrafting data augmentations for visual representation learning by introducing DA_IC-GAN, an instance-conditioned GAN module that boosts accuracy by 1-2%p on ImageNet with ResNets and DeiT models and improves robustness in out-of-distribution transfers.

Data augmentation has become a crucial component to train state-of-the-art visual representation models. However, handcrafting combinations of transformations that lead to improved performances is a laborious task, which can result in visually unrealistic samples. To overcome these limitations, recent works have explored the use of generative models as learnable data augmentation tools, showing promising results in narrow application domains, e.g., few-shot learning and low-data medical imaging. In this paper, we introduce a data augmentation module, called DA_IC-GAN, which leverages instance-conditioned GAN generations and can be used off-the-shelf in conjunction with most state-of-the-art training recipes. We showcase the benefits of DA_IC-GAN by plugging it out-of-the-box into the supervised training of ResNets and DeiT models on the ImageNet dataset, and achieving accuracy boosts up to between 1%p and 2%p with the highest capacity models. Moreover, the learnt representations are shown to be more robust than the baselines when transferred to a handful of out-of-distribution datasets, and exhibit increased invariance to variations of instance and viewpoints. We additionally couple DA_IC-GAN with a self-supervised training recipe and show that we can also achieve an improvement of 1%p in accuracy in some settings. With this work, we strengthen the evidence on the potential of learnable data augmentations to improve visual representation learning, paving the road towards non-handcrafted augmentations in model training.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes