LGAIJun 24, 2021

Promises and Pitfalls of Black-Box Concept Learning Models

arXiv:2106.13314v1139 citations
Originality Incremental advance
AI Analysis

This addresses the reliability of interpretable AI for users needing trustworthy explanations, highlighting a critical pitfall in current methods.

The paper tackled the problem of concept learning models encoding extra information beyond predefined concepts, which misleads interpretation, and demonstrated that existing mitigation strategies are insufficient.

Machine learning models that incorporate concept learning as an intermediate step in their decision making process can match the performance of black-box predictive models while retaining the ability to explain outcomes in human understandable terms. However, we demonstrate that the concept representations learned by these models encode information beyond the pre-defined concepts, and that natural mitigation strategies do not fully work, rendering the interpretation of the downstream prediction misleading. We describe the mechanism underlying the information leakage and suggest recourse for mitigating its effects.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes