Towards generating more interpretable counterfactuals via concept vectors: a preliminary study on chest X-rays
This work addresses interpretability for medical imaging models, though it is incremental as it builds on existing concept-based methods without outperforming baselines.
The paper tackled the problem of generating interpretable counterfactuals in medical imaging by mapping clinical concepts into the latent space of generative models using Concept Activation Vectors (CAVs), with preliminary results showing promise for large pathologies like cardiomegaly but challenges for smaller ones due to reconstruction limits.
An essential step in deploying medical imaging models is ensuring alignment with clinical knowledge and interpretability. We focus on mapping clinical concepts into the latent space of generative models to identify Concept Activation Vectors (CAVs). Using a simple reconstruction autoencoder, we link user-defined concepts to image-level features without explicit label training. The extracted concepts are stable across datasets, enabling visual explanations that highlight clinically relevant features. By traversing latent space along concept directions, we produce counterfactuals that exaggerate or reduce specific clinical features. Preliminary results on chest X-rays show promise for large pathologies like cardiomegaly, while smaller pathologies remain challenging due to reconstruction limits. Although not outperforming baselines, this approach offers a path toward interpretable, concept-based explanations aligned with clinical knowledge.