CVSep 2, 2025

Systematic Integration of Attention Modules into CNNs for Accurate and Generalizable Medical Image Diagnosis

Zahid Ullah, Minki Hong, Tahir Mahmood, Jihie Kim

arXiv:2509.05343v13.61 citationsh-index: 3

Originality Synthesis-oriented

AI Analysis

This work addresses the need for accurate and generalizable medical image diagnosis tools, though it is incremental as it applies existing attention modules to standard CNNs.

The authors tackled the problem of conventional CNNs failing to capture fine-grained features in medical image diagnosis by systematically integrating attention mechanisms into five CNN architectures, resulting in consistent performance improvements across two distinct medical imaging datasets, with EfficientNetB5 with hybrid attention achieving the highest overall performance.

Deep learning has become a powerful tool for medical image analysis; however, conventional Convolutional Neural Networks (CNNs) often fail to capture the fine-grained and complex features critical for accurate diagnosis. To address this limitation, we systematically integrate attention mechanisms into five widely adopted CNN architectures, namely, VGG16, ResNet18, InceptionV3, DenseNet121, and EfficientNetB5, to enhance their ability to focus on salient regions and improve discriminative performance. Specifically, each baseline model is augmented with either a Squeeze and Excitation block or a hybrid Convolutional Block Attention Module, allowing adaptive recalibration of channel and spatial feature representations. The proposed models are evaluated on two distinct medical imaging datasets, a brain tumor MRI dataset comprising multiple tumor subtypes, and a Products of Conception histopathological dataset containing four tissue categories. Experimental results demonstrate that attention augmented CNNs consistently outperform baseline architectures across all metrics. In particular, EfficientNetB5 with hybrid attention achieves the highest overall performance, delivering substantial gains on both datasets. Beyond improved classification accuracy, attention mechanisms enhance feature localization, leading to better generalization across heterogeneous imaging modalities. This work contributes a systematic comparative framework for embedding attention modules in diverse CNN architectures and rigorously assesses their impact across multiple medical imaging tasks. The findings provide practical insights for the development of robust, interpretable, and clinically applicable deep learning based decision support systems.

View on arXiv PDF

Similar