Towards Robust and Reliable Concept Representations: Reliability-Enhanced Concept Embedding Model
This work addresses reliability issues in interpretable AI models for researchers and practitioners, though it is incremental as it builds on existing CBMs with specific enhancements.
The paper tackled the problem of unreliable concept representations in Concept Bottleneck Models (CBMs), which can undermine robustness under distribution shifts, and proposed RECEM with concept-level disentanglement and mixup to improve reliability, resulting in consistent outperformance of baselines across multiple datasets under background and domain shifts.
Concept Bottleneck Models (CBMs) aim to enhance interpretability by predicting human-understandable concepts as intermediates for decision-making. However, these models often face challenges in ensuring reliable concept representations, which can propagate to downstream tasks and undermine robustness, especially under distribution shifts. Two inherent issues contribute to concept unreliability: sensitivity to concept-irrelevant features (e.g., background variations) and lack of semantic consistency for the same concept across different samples. To address these limitations, we propose the Reliability-Enhanced Concept Embedding Model (RECEM), which introduces a two-fold strategy: Concept-Level Disentanglement to separate irrelevant features from concept-relevant information and a Concept Mixup mechanism to ensure semantic alignment across samples. These mechanisms work together to improve concept reliability, enabling the model to focus on meaningful object attributes and generate faithful concept representations. Experimental results demonstrate that RECEM consistently outperforms existing baselines across multiple datasets, showing superior performance under background and domain shifts. These findings highlight the effectiveness of disentanglement and alignment strategies in enhancing both reliability and robustness in CBMs.