On the Robustness of Interpretability Methods
This addresses the reliability of interpretability methods for users in AI and machine learning, but it is incremental as it builds on existing approaches.
The paper tackles the problem of ensuring that interpretability methods produce similar explanations for similar inputs, and demonstrates that current methods perform poorly on this robustness metric while proposing ways to enforce robustness.
We argue that robustness of explanations---i.e., that similar inputs should give rise to similar explanations---is a key desideratum for interpretability. We introduce metrics to quantify robustness and demonstrate that current methods do not perform well according to these metrics. Finally, we propose ways that robustness can be enforced on existing interpretability approaches.