Measuring Leakage in Concept-Based Methods: An Information Theoretic Approach
This work addresses leakage quantification for interpretable AI methods, but it is incremental as it focuses on measurement in controlled synthetic settings without real-world application.
The paper tackled the problem of unintended information leakage in Concept Bottleneck Models (CBMs), which compromises interpretability, by introducing an information-theoretic measure to quantify such leakage, validated through synthetic experiments showing that feature and concept dimensionality significantly influence leakage and XGBoost is the most reliable estimator.
Concept Bottleneck Models (CBMs) aim to enhance interpretability by structuring predictions around human-understandable concepts. However, unintended information leakage, where predictive signals bypass the concept bottleneck, compromises their transparency. This paper introduces an information-theoretic measure to quantify leakage in CBMs, capturing the extent to which concept embeddings encode additional, unintended information beyond the specified concepts. We validate the measure through controlled synthetic experiments, demonstrating its effectiveness in detecting leakage trends across various configurations. Our findings highlight that feature and concept dimensionality significantly influence leakage, and that classifier choice impacts measurement stability, with XGBoost emerging as the most reliable estimator. Additionally, preliminary investigations indicate that the measure exhibits the anticipated behavior when applied to soft joint CBMs, suggesting its reliability in leakage quantification beyond fully synthetic settings. While this study rigorously evaluates the measure in controlled synthetic experiments, future work can extend its application to real-world datasets.