Towards Better Guided Attention and Human Knowledge Insertion in Deep Convolutional Neural Networks
This work addresses the need for better interpretability and performance in image recognition, particularly for fine-grained tasks, but it is incremental as it builds upon existing Attention Branch Networks.
The authors tackled the problem of improving visual explanation and performance in deep convolutional neural networks by introducing Multi-Scale Attention Branch Networks (MSABN), which enhanced attention map resolution and outperformed baseline models on benchmark datasets. They also developed a data augmentation strategy using attention maps to incorporate human knowledge from bounding box annotations, achieving significant performance gains with limited edited samples.
Attention Branch Networks (ABNs) have been shown to simultaneously provide visual explanation and improve the performance of deep convolutional neural networks (CNNs). In this work, we introduce Multi-Scale Attention Branch Networks (MSABN), which enhance the resolution of the generated attention maps, and improve the performance. We evaluate MSABN on benchmark image recognition and fine-grained recognition datasets where we observe MSABN outperforms ABN and baseline models. We also introduce a new data augmentation strategy utilizing the attention maps to incorporate human knowledge in the form of bounding box annotations of the objects of interest. We show that even with a limited number of edited samples, a significant performance gain can be achieved with this strategy.