Mask Guided Attention For Fine-Grained Patchy Image Classification
This work provides an incremental improvement for fine-grained patchy image classification, particularly benefiting domains with limited data and subtle visual distinctions.
This paper addresses fine-grained patchy image classification, a task challenged by subtle inter-category differences and limited training data. The authors propose Mask Guided Attention (MGA), which leverages a pre-trained semantic segmentation model to generate patchy attention masks. This method filters out insignificant image parts, leading to superior performance on three public datasets and improving accuracy by 2.25% on SoyCultivarVein and 2% on BtfPIS.
In this work, we present a novel mask guided attention (MGA) method for fine-grained patchy image classification. The key challenge of fine-grained patchy image classification lies in two folds, ultra-fine-grained inter-category variances among objects and very few data available for training. This motivates us to consider employing more useful supervision signal to train a discriminative model within limited training samples. Specifically, the proposed MGA integrates a pre-trained semantic segmentation model that produces auxiliary supervision signal, i.e., patchy attention mask, enabling a discriminative representation learning. The patchy attention mask drives the classifier to filter out the insignificant parts of images (e.g., common features between different categories), which enhances the robustness of MGA for the fine-grained patchy image classification. We verify the effectiveness of our method on three publicly available patchy image datasets. Experimental results demonstrate that our MGA method achieves superior performance on three datasets compared with the state-of-the-art methods. In addition, our ablation study shows that MGA improves the accuracy by 2.25% and 2% on the SoyCultivarVein and BtfPIS datasets, indicating its practicality towards solving the fine-grained patchy image classification.