Towards Faithful Multimodal Concept Bottleneck Models
This work addresses the challenge of faithful interpretability in multimodal AI for researchers and practitioners, though it is incremental as it builds on existing CBM approaches.
The paper tackled the problem of concept leakage and detection in multimodal Concept Bottleneck Models (CBMs) by introducing f-CBM, a framework that jointly addresses both issues through a differentiable leakage loss and a Kolmogorov-Arnold Network prediction head, achieving the best trade-off between task accuracy, concept detection, and leakage reduction across image and text datasets.
Concept Bottleneck Models (CBMs) are interpretable models that route predictions through a layer of human-interpretable concepts. While widely studied in vision and, more recently, in NLP, CBMs remain largely unexplored in multimodal settings. For their explanations to be faithful, CBMs must satisfy two conditions: concepts must be properly detected, and concept representations must encode only their intended semantics, without smuggling extraneous task-relevant or inter-concept information into final predictions, a phenomenon known as leakage. Existing approaches treat concept detection and leakage mitigation as separate problems, and typically improve one at the expense of predictive accuracy. In this work, we introduce f-CBM, a faithful multimodal CBM framework built on a vision-language backbone that jointly targets both aspects through two complementary strategies: a differentiable leakage loss to mitigate leakage, and a Kolmogorov-Arnold Network prediction head that provides sufficient expressiveness to improve concept detection. Experiments demonstrate that f-CBM achieves the best trade-off between task accuracy, concept detection, and leakage reduction, while applying seamlessly to both image and text or text-only datasets, making it versatile across modalities.