GRANITE: A Generalized Regional Framework for Identifying Agreement in Feature-Based Explanations
This work addresses the issue of conflicting explanations in interpretable machine learning, which is crucial for practitioners relying on feature-based methods, though it appears incremental as it builds on existing regional approaches.
The paper tackles the problem of disagreement among feature-based explanation methods by proposing GRANITE, a generalized regional framework that partitions the feature space to minimize interaction and distribution influences, resulting in more consistent and interpretable explanations.
Feature-based explanation methods aim to quantify how features influence the model's behavior, either locally or globally, but different methods often disagree, producing conflicting explanations. This disagreement arises primarily from two sources: how feature interactions are handled and how feature dependencies are incorporated. We propose GRANITE, a generalized regional explanation framework that partitions the feature space into regions where interaction and distribution influences are minimized. This approach aligns different explanation methods, yielding more consistent and interpretable explanations. GRANITE unifies existing regional approaches, extends them to feature groups, and introduces a recursive partitioning algorithm to estimate such regions. We demonstrate its effectiveness on real-world datasets, providing a practical tool for consistent and interpretable feature explanations.