CV LGMar 19, 2024

TransformMix: Learning Transformation and Mixing Strategies from Data

arXiv:2403.12429v12 citations

Originality Incremental advance

AI Analysis

This work addresses the need for adaptive data augmentation in computer vision to improve model generalization across various datasets, though it appears incremental as it builds on existing sample-mixing methods.

The authors tackled the problem of sample-mixing data augmentation methods like Mixup and Cutmix being heuristic and not adapting to different datasets, which can create misleading images and limit generalization. They proposed TransformMix, an automated approach that learns transformation and mixing strategies from data, achieving better performance and efficiency compared to strong baselines in tasks such as transfer learning, classification, object detection, and knowledge distillation.

Data augmentation improves the generalization power of deep learning models by synthesizing more training samples. Sample-mixing is a popular data augmentation approach that creates additional data by combining existing samples. Recent sample-mixing methods, like Mixup and Cutmix, adopt simple mixing operations to blend multiple inputs. Although such a heuristic approach shows certain performance gains in some computer vision tasks, it mixes the images blindly and does not adapt to different datasets automatically. A mixing strategy that is effective for a particular dataset does not often generalize well to other datasets. If not properly configured, the methods may create misleading mixed images, which jeopardize the effectiveness of sample-mixing augmentations. In this work, we propose an automated approach, TransformMix, to learn better transformation and mixing augmentation strategies from data. In particular, TransformMix applies learned transformations and mixing masks to create compelling mixed images that contain correct and important information for the target tasks. We demonstrate the effectiveness of TransformMix on multiple datasets in transfer learning, classification, object detection, and knowledge distillation settings. Experimental results show that our method achieves better performance as well as efficiency when compared with strong sample-mixing baselines.

View on arXiv PDF

Similar