Explainable AI needs formalization
This addresses the need for more rigorous and effective explainable AI methods for researchers and practitioners, but it is incremental as it critiques existing approaches without introducing a new solution.
The paper argues that current explainable AI (XAI) methods are unreliable because they often attribute importance to irrelevant features, limiting their utility for tasks like model diagnosis and scientific discovery. It proposes that researchers should formally define explanation problems and develop objective metrics to validate XAI algorithms.
The field of "explainable artificial intelligence" (XAI) seemingly addresses the desire that decisions of machine learning systems should be human-understandable. However, in its current state, XAI itself needs scrutiny. Popular methods cannot reliably answer relevant questions about ML models, their training data, or test inputs, because they systematically attribute importance to input features that are independent of the prediction target. This limits the utility of XAI for diagnosing and correcting data and models, for scientific discovery, and for identifying intervention targets. The fundamental reason for this is that current XAI methods do not address well-defined problems and are not evaluated against targeted criteria of explanation correctness. Researchers should formally define the problems they intend to solve and design methods accordingly. This will lead to diverse use-case-dependent notions of explanation correctness and objective metrics of explanation performance that can be used to validate XAI algorithms.