LGOct 10, 2023

AttributionLab: Faithfulness of Feature Attribution Under Controllable Environments

arXiv:2310.06514v24 citationsh-index: 17
Originality Incremental advance
AI Analysis

This work addresses the issue of unreliable feature attribution evaluations for researchers and practitioners in explainable AI, though it is incremental as it builds on existing controlled testing approaches.

The paper tackles the problem of evaluating the faithfulness of feature attribution methods by introducing AttributionLab, a controlled environment where both data and network weights are designed to have known relevant features. The result is a framework that serves as a sanity check, showing that if an attribution method fails in this controlled setting, it is likely unreliable in real-world scenarios.

Feature attribution explains neural network outputs by identifying relevant input features. The attribution has to be faithful, meaning that the attributed features must mirror the input features that influence the output. One recent trend to test faithfulness is to fit a model on designed data with known relevant features and then compare attributions with ground truth input features.This idea assumes that the model learns to use all and only these designed features, for which there is no guarantee. In this paper, we solve this issue by designing the network and manually setting its weights, along with designing data. The setup, AttributionLab, serves as a sanity check for faithfulness: If an attribution method is not faithful in a controlled environment, it can be unreliable in the wild. The environment is also a laboratory for controlled experiments by which we can analyze attribution methods and suggest improvements.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes