LGCVNov 30, 2023

Benchmarking and Enhancing Disentanglement in Concept-Residual Models

arXiv:2312.00192v15 citationsh-index: 16
Originality Incremental advance
AI Analysis

This work addresses a specific issue in interpretable machine learning for researchers and practitioners, but it is incremental as it builds on prior methods to enhance disentanglement.

The paper tackled the problem of information leakage in concept bottleneck models with residual channels, which harms interpretability, by proposing three novel disentanglement methods to balance performance and interpretability, and empirically evaluated them on CUB, OAI, and CIFAR 100 datasets, showing their impact on intervention ability and task performance.

Concept bottleneck models (CBMs) are interpretable models that first predict a set of semantically meaningful features, i.e., concepts, from observations that are subsequently used to condition a downstream task. However, the model's performance strongly depends on the engineered features and can severely suffer from incomplete sets of concepts. Prior works have proposed a side channel -- a residual -- that allows for unconstrained information flow to the downstream task, thus improving model performance but simultaneously introducing information leakage, which is undesirable for interpretability. This work proposes three novel approaches to mitigate information leakage by disentangling concepts and residuals, investigating the critical balance between model performance and interpretability. Through extensive empirical analysis on the CUB, OAI, and CIFAR 100 datasets, we assess the performance of each disentanglement method and provide insights into when they work best. Further, we show how each method impacts the ability to intervene over the concepts and their subsequent impact on task performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes