Sound Explanation for Trustworthy Machine Learning
This work addresses the need for trustworthy AI in critical domains like healthcare by introducing a formal framework for explanations, though it builds on prior informal concepts.
The paper tackles the explainability problem in machine learning by proving that no attribution algorithm can satisfy key desirable properties, and formalizes sound explanation as providing causal information for predictions, applying it to cancer prediction models to build clinician trust.
We take a formal approach to the explainability problem of machine learning systems. We argue against the practice of interpreting black-box models via attributing scores to input components due to inherently conflicting goals of attribution-based interpretation. We prove that no attribution algorithm satisfies specificity, additivity, completeness, and baseline invariance. We then formalize the concept, sound explanation, that has been informally adopted in prior work. A sound explanation entails providing sufficient information to causally explain the predictions made by a system. Finally, we present the application of feature selection as a sound explanation for cancer prediction models to cultivate trust among clinicians.