LGAICVJun 7, 2024

Provably Better Explanations with Optimized Aggregation of Feature Attributions

arXiv:2406.05090v18 citations
Originality Incremental advance
AI Analysis

This work addresses the reliability of post-hoc explanations for ML practitioners, though it is incremental as it builds on existing attribution methods.

The paper tackles the problem of inconsistent and unreliable feature attribution methods for explaining opaque ML models by proposing an optimized convex combination of multiple attributions, which provably improves quality criteria like robustness and faithfulness, and demonstrates consistent outperformance over individual methods and baselines in experiments.

Using feature attributions for post-hoc explanations is a common practice to understand and verify the predictions of opaque machine learning models. Despite the numerous techniques available, individual methods often produce inconsistent and unstable results, putting their overall reliability into question. In this work, we aim to systematically improve the quality of feature attributions by combining multiple explanations across distinct methods or their variations. For this purpose, we propose a novel approach to derive optimal convex combinations of feature attributions that yield provable improvements of desired quality criteria such as robustness or faithfulness to the model behavior. Through extensive experiments involving various model architectures and popular feature attribution techniques, we demonstrate that our combination strategy consistently outperforms individual methods and existing baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes