LGNov 21, 2022

Learn to explain yourself, when you can: Equipping Concept Bottleneck Models with the ability to abstain on their concept predictions

Joshua Lockhart, Daniele Magazzeni, Manuela Veloso

arXiv:2211.11690v210.48 citationsh-index: 32

Originality Synthesis-oriented

AI Analysis

This work addresses the robustness issue in interpretable AI for users needing reliable explanations, though it is incremental as it builds on existing CBM frameworks.

The paper tackles the reliance of Concept Bottleneck Models (CBMs) on human-provided concept labels by enabling them to abstain from predicting concepts when uncertain, allowing the model to provide rationales only when confident in their correctness.

The Concept Bottleneck Models (CBMs) of Koh et al. [2020] provide a means to ensure that a neural network based classifier bases its predictions solely on human understandable concepts. The concept labels, or rationales as we refer to them, are learned by the concept labeling component of the CBM. Another component learns to predict the target classification label from these predicted concept labels. Unfortunately, these models are heavily reliant on human provided concept labels for each datapoint. To enable CBMs to behave robustly when these labels are not readily available, we show how to equip them with the ability to abstain from predicting concepts when the concept labeling component is uncertain. In other words, our model learns to provide rationales for its predictions, but only whenever it is sure the rationale is correct.

View on arXiv PDF

Similar