Adaptive Test-Time Intervention for Concept Bottleneck Models
This work addresses the trade-off between interpretability and performance in machine learning models for researchers and practitioners using concept-based approaches, representing an incremental improvement.
The paper tackles the problem of maintaining interpretability in concept bottleneck models (CBM) without sacrificing prediction performance by proposing FIGS-BD, a method that distills the concept-to-target portion into an interpretable tree-based model. It demonstrates that adaptive test-time intervention using this method identifies key concepts to improve performance in human-in-the-loop settings across 4 datasets.
Concept bottleneck models (CBM) aim to improve model interpretability by predicting human level "concepts" in a bottleneck within a deep learning model architecture. However, how the predicted concepts are used in predicting the target still either remains black-box or is simplified to maintain interpretability at the cost of prediction performance. We propose to use Fast Interpretable Greedy Sum-Trees (FIGS) to obtain Binary Distillation (BD). This new method, called FIGS-BD, distills a binary-augmented concept-to-target portion of the CBM into an interpretable tree-based model, while maintaining the competitive prediction performance of the CBM teacher. FIGS-BD can be used in downstream tasks to explain and decompose CBM predictions into interpretable binary-concept-interaction attributions and guide adaptive test-time intervention. Across 4 datasets, we demonstrate that our adaptive test-time intervention identifies key concepts that significantly improve performance for realistic human-in-the-loop settings that only allow for limited concept interventions.