ViConEx-Med: Visual Concept Explainability via Multi-Concept Token Transformer for Medical Image Analysis
This addresses the need for interpretable models in high-stakes medical applications by providing visual explanations, though it is incremental as it builds on existing concept-based approaches.
The paper tackles the lack of visual localization in concept-based models for medical image analysis by proposing ViConEx-Med, a transformer-based framework that uses multi-concept tokens to jointly predict and localize visual concepts, achieving competitive performance with black-box models in concept detection and localization precision.
Concept-based models aim to explain model decisions with human-understandable concepts. However, most existing approaches treat concepts as numerical attributes, without providing complementary visual explanations that could localize the predicted concepts. This limits their utility in real-world applications and particularly in high-stakes scenarios, such as medical use-cases. This paper proposes ViConEx-Med, a novel transformer-based framework for visual concept explainability, which introduces multi-concept learnable tokens to jointly predict and localize visual concepts. By leveraging specialized attention layers for processing visual and text-based concept tokens, our method produces concept-level localization maps while maintaining high predictive accuracy. Experiments on both synthetic and real-world medical datasets demonstrate that ViConEx-Med outperforms prior concept-based models and achieves competitive performance with black-box models in terms of both concept detection and localization precision. Our results suggest a promising direction for building inherently interpretable models grounded in visual concepts. Code is publicly available at https://github.com/CristianoPatricio/viconex-med.