Superpixel-based Knowledge Infusion in Deep Neural Networks for Image Classification
This work addresses the challenge of enhancing computer vision models for tasks like object recognition and medical imaging by integrating superpixel-based relational data, though it appears incremental as it combines existing CNN and GNN methods.
The paper tackled the problem of improving image classification by incorporating higher-order relational information from superpixels into deep neural networks, resulting in performance gains across seven diverse datasets.
Superpixels are higher-order perceptual groups of pixels in an image, often carrying much more information than the raw pixels. There is an inherent relational structure to the relationship among different superpixels of an image such as adjacent superpixels are neighbours of each other. Our interest here is to treat these relative positions of various superpixels as relational information of an image. This relational information can convey higher-order spatial information about the image, such as the relationship between superpixels representing two eyes in an image of a cat. That is, two eyes are placed adjacent to each other in a straight line or the mouth is below the nose. Our motive in this paper is to assist computer vision models, specifically those based on Deep Neural Networks (DNNs), by incorporating this higher-order information from superpixels. We construct a hybrid model that leverages (a) Convolutional Neural Network (CNN) to deal with spatial information in an image and (b) Graph Neural Network (GNN) to deal with relational superpixel information in the image. The proposed model is learned using a generic hybrid loss function. Our experiments are extensive, and we evaluate the predictive performance of our proposed hybrid vision model on seven different image classification datasets from a variety of domains such as digit and object recognition, biometrics, medical imaging. The results demonstrate that the relational superpixel information processed by a GNN can improve the performance of a standard CNN-based vision system.