LG AIMay 10, 2021

Do Concept Bottleneck Models Learn as Intended?

Andrei Margeloiu, Matthew Ashman, Umang Bhatt, Yanzhi Chen, Mateja Jamnik, Adrian Weller

arXiv:2105.04289v130.1130 citations

Originality Synthesis-oriented

AI Analysis

This work challenges the utility of CBMs for interpretable AI, indicating they may be incremental or ineffective in current applications.

The paper investigates whether concept bottleneck models (CBMs) achieve their intended goals of interpretability, predictability, and intervenability, finding that they fail as concepts lack semantic meaning in input space.

Concept bottleneck models map from raw inputs to concepts, and then from concepts to targets. Such models aim to incorporate pre-specified, high-level concepts into the learning procedure, and have been motivated to meet three desiderata: interpretability, predictability, and intervenability. However, we find that concept bottleneck models struggle to meet these goals. Using post hoc interpretability methods, we demonstrate that concepts do not correspond to anything semantically meaningful in input space, thus calling into question the usefulness of concept bottleneck models in their current form.

View on arXiv PDF

Similar