Revisiting Data Augmentation for Rotational Invariance in Convolutional Neural Networks
This addresses the challenge of rotational invariance in computer vision tasks, but it is incremental as it revisits and validates an existing data augmentation approach without introducing new methods.
The paper tackled the problem of achieving rotational invariance in CNNs for image classification, finding that data augmentation alone enables networks to classify rotated images nearly as well as unrotated ones, with no significant accuracy increase from specialized methods like Spatial Transformer Networks or Group Equivariant CNNs.
Convolutional Neural Networks (CNN) offer state of the art performance in various computer vision tasks. Many of those tasks require different subtypes of affine invariances (scale, rotational, translational) to image transformations. Convolutional layers are translation equivariant by design, but in their basic form lack invariances. In this work we investigate how best to include rotational invariance in a CNN for image classification. Our experiments show that networks trained with data augmentation alone can classify rotated images nearly as well as in the normal unrotated case; this increase in representational power comes only at the cost of training time. We also compare data augmentation versus two modified CNN models for achieving rotational invariance or equivariance, Spatial Transformer Networks and Group Equivariant CNNs, finding no significant accuracy increase with these specialized methods. In the case of data augmented networks, we also analyze which layers help the network to encode the rotational invariance, which is important for understanding its limitations and how to best retrain a network with data augmentation to achieve invariance to rotation.