Learning robust visual representations using data augmentation invariance
This addresses the robustness gap in visual representations for AI and neuroscience applications, but it is incremental as it builds on existing data augmentation methods.
The paper tackled the problem that convolutional neural networks for object categorization lack robustness to identity-preserving image transformations, despite theoretical expectations, and proposed data augmentation invariance as an unsupervised objective to improve this, resulting in increased invariance with similar performance and a 10% training time increase.
Deep convolutional neural networks trained for image object categorization have shown remarkable similarities with representations found across the primate ventral visual stream. Yet, artificial and biological networks still exhibit important differences. Here we investigate one such property: increasing invariance to identity-preserving image transformations found along the ventral stream. Despite theoretical evidence that invariance should emerge naturally from the optimization process, we present empirical evidence that the activations of convolutional neural networks trained for object categorization are not robust to identity-preserving image transformations commonly used in data augmentation. As a solution, we propose data augmentation invariance, an unsupervised learning objective which improves the robustness of the learned representations by promoting the similarity between the activations of augmented image samples. Our results show that this approach is a simple, yet effective and efficient (10 % increase in training time) way of increasing the invariance of the models while obtaining similar categorization performance.