IVCVLGJul 31, 2020

A Novel Global Spatial Attention Mechanism in Convolutional Neural Network for Medical Image Classification

arXiv:2007.15897v115 citations
Originality Incremental advance
AI Analysis

This work addresses the need for shared attention maps in medical image classification, where images often depict similar symptoms, offering an incremental improvement over existing local attention methods.

The paper tackled the problem of local attention mechanisms in CNNs being image-specific, which is suboptimal for medical images sharing structural content, by proposing a global spatial attention mechanism that uses a binary classifier to create a shared weight map. Experiments on medical and facial expression datasets showed improved performance for CNNs like GoogleNet and ResNet, with meaningful attended regions aiding interpretability.

Spatial attention has been introduced to convolutional neural networks (CNNs) for improving both their performance and interpretability in visual tasks including image classification. The essence of the spatial attention is to learn a weight map which represents the relative importance of activations within the same layer or channel. All existing attention mechanisms are local attentions in the sense that weight maps are image-specific. However, in the medical field, there are cases that all the images should share the same weight map because the set of images record the same kind of symptom related to the same object and thereby share the same structural content. In this paper, we thus propose a novel global spatial attention mechanism in CNNs mainly for medical image classification. The global weight map is instantiated by a decision boundary between important pixels and unimportant pixels. And we propose to realize the decision boundary by a binary classifier in which the intensities of all images at a pixel are the features of the pixel. The binary classification is integrated into an image classification CNN and is to be optimized together with the CNN. Experiments on two medical image datasets and one facial expression dataset showed that with the proposed attention, not only the performance of four powerful CNNs which are GoogleNet, VGG, ResNet, and DenseNet can be improved, but also meaningful attended regions can be obtained, which is beneficial for understanding the content of images of a domain.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes