Zero-shot Concept Bottleneck Models
This work addresses the problem of limited interpretability and intervenability in neural network models for users who require explainable AI, particularly in domains where target task training data is scarce or unavailable.
The authors tackled the problem of requiring target task training for concept bottleneck models by introducing zero-shot concept bottleneck models, which can predict concepts and labels without training neural networks, achieving interpretable and intervenable results. The models utilize a large-scale concept bank and achieve this through concept retrieval and concept regression.
Concept bottleneck models (CBMs) are inherently interpretable and intervenable neural network models, which explain their final label prediction by the intermediate prediction of high-level semantic concepts. However, they require target task training to learn input-to-concept and concept-to-label mappings, incurring target dataset collections and training resources. In this paper, we present \textit{zero-shot concept bottleneck models} (Z-CBMs), which predict concepts and labels in a fully zero-shot manner without training neural networks. Z-CBMs utilize a large-scale concept bank, which is composed of millions of vocabulary extracted from the web, to describe arbitrary input in various domains. For the input-to-concept mapping, we introduce concept retrieval, which dynamically finds input-related concepts by the cross-modal search on the concept bank. In the concept-to-label inference, we apply concept regression to select essential concepts from the retrieved concepts by sparse linear regression. Through extensive experiments, we confirm that our Z-CBMs provide interpretable and intervenable concepts without any additional training. Code will be available at https://github.com/yshinya6/zcbm.