LGCRNov 7, 2022

Towards learning to explain with concept bottleneck models: mitigating information leakage

arXiv:2211.03656v17 citationsh-index: 32
Originality Incremental advance
AI Analysis

This work addresses trust issues in interpretable AI for users relying on concept-based explanations, though it is incremental as it builds on existing methods to fix a specific flaw.

The paper tackled the problem of information leakage in concept bottleneck models when using soft concept labels, which undermines model trust, and demonstrated that Monte-Carlo Dropout can mitigate this leakage to produce more reliable concept predictions.

Concept bottleneck models perform classification by first predicting which of a list of human provided concepts are true about a datapoint. Then a downstream model uses these predicted concept labels to predict the target label. The predicted concepts act as a rationale for the target prediction. Model trust issues emerge in this paradigm when soft concept labels are used: it has previously been observed that extra information about the data distribution leaks into the concept predictions. In this work we show how Monte-Carlo Dropout can be used to attain soft concept predictions that do not contain leaked information.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes