Geometric Data Augmentation Based on Feature Map Ensemble
This addresses robustness issues in computer vision for applications requiring invariance to geometric transformations, but it is incremental as it builds on existing CNNs and data augmentation methods.
The paper tackles the problem of CNNs' performance degradation under geometric transformations like large rotations by proposing a novel architecture that encloses existing backbones with geometric transformations and feature map ensembles, achieving improved robustness without modifying the backbones, as demonstrated on datasets such as CIFAR, CUB-200, and Mnist-rot-12k.
Deep convolutional networks have become the mainstream in computer vision applications. Although CNNs have been successful in many computer vision tasks, it is not free from drawbacks. The performance of CNN is dramatically degraded by geometric transformation, such as large rotations. In this paper, we propose a novel CNN architecture that can improve the robustness against geometric transformations without modifying the existing backbones of their CNNs. The key is to enclose the existing backbone with a geometric transformation (and the corresponding reverse transformation) and a feature map ensemble. The proposed method can inherit the strengths of existing CNNs that have been presented so far. Furthermore, the proposed method can be employed in combination with state-of-the-art data augmentation algorithms to improve their performance. We demonstrate the effectiveness of the proposed method using standard datasets such as CIFAR, CUB-200, and Mnist-rot-12k.