LGAICLMay 16, 2023

The Weighted Möbius Score: A Unified Framework for Feature Attribution

arXiv:2305.09204v11 citations
Originality Incremental advance
AI Analysis

This provides a unified framework for researchers and practitioners in explainable AI, though it is incremental as it builds on existing attribution work.

The paper tackles the lack of a unified framework for feature attribution methods, which explain black-box model predictions, by introducing the Weighted Möbius Score, showing that many existing methods are special cases and identifying new ones.

Feature attribution aims to explain the reasoning behind a black-box model's prediction by identifying the impact of each feature on the prediction. Recent work has extended feature attribution to interactions between multiple features. However, the lack of a unified framework has led to a proliferation of methods that are often not directly comparable. This paper introduces a parameterized attribution framework -- the Weighted Möbius Score -- and (i) shows that many different attribution methods for both individual features and feature interactions are special cases and (ii) identifies some new methods. By studying the vector space of attribution methods, our framework utilizes standard linear algebra tools and provides interpretations in various fields, including cooperative game theory and causal mediation analysis. We empirically demonstrate the framework's versatility and effectiveness by applying these attribution methods to feature interactions in sentiment analysis and chain-of-thought prompting.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes