CVAIMar 12, 2025

Evaluating Visual Explanations of Attention Maps for Transformer-based Medical Imaging

arXiv:2503.09535v115 citationsh-index: 7ISIC/iMIMIC/EARTH/DeCaF@MICCAI
Originality Synthesis-oriented
AI Analysis

This work addresses the explainability problem for clinicians using transformer-based medical imaging models, but it is incremental as it evaluates existing methods rather than proposing new ones.

The paper compared visual explanations from attention maps against other interpretability methods for Vision Transformers in medical imaging, finding that while attention maps sometimes outperform GradCAM, they are generally less effective than transformer-specific methods and provide inconsistent insights across four medical datasets.

Although Vision Transformers (ViTs) have recently demonstrated superior performance in medical imaging problems, they face explainability issues similar to previous architectures such as convolutional neural networks. Recent research efforts suggest that attention maps, which are part of decision-making process of ViTs can potentially address the explainability issue by identifying regions influencing predictions, especially in models pretrained with self-supervised learning. In this work, we compare the visual explanations of attention maps to other commonly used methods for medical imaging problems. To do so, we employ four distinct medical imaging datasets that involve the identification of (1) colonic polyps, (2) breast tumors, (3) esophageal inflammation, and (4) bone fractures and hardware implants. Through large-scale experiments on the aforementioned datasets using various supervised and self-supervised pretrained ViTs, we find that although attention maps show promise under certain conditions and generally surpass GradCAM in explainability, they are outperformed by transformer-specific interpretability methods. Our findings indicate that the efficacy of attention maps as a method of interpretability is context-dependent and may be limited as they do not consistently provide the comprehensive insights required for robust medical decision-making.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes