A Theoretical Framework for AI Models Explainability with Application in Biomedicine
This work addresses foundational issues in XAI for researchers and practitioners, particularly in biomedicine, but is incremental as it synthesizes existing literature into a new framework.
The authors tackled the lack of shared terminology and structural soundness in Explainable AI (XAI) by proposing a novel theoretical framework that defines explanations as a combination of model evidence and human interpretation, emphasizing faithfulness and plausibility properties.
EXplainable Artificial Intelligence (XAI) is a vibrant research topic in the artificial intelligence community, with growing interest across methods and domains. Much has been written about the subject, yet XAI still lacks shared terminology and a framework capable of providing structural soundness to explanations. In our work, we address these issues by proposing a novel definition of explanation that is a synthesis of what can be found in the literature. We recognize that explanations are not atomic but the combination of evidence stemming from the model and its input-output mapping, and the human interpretation of this evidence. Furthermore, we fit explanations into the properties of faithfulness (i.e., the explanation being a true description of the model's inner workings and decision-making process) and plausibility (i.e., how much the explanation looks convincing to the user). Using our proposed theoretical framework simplifies how these properties are operationalized and it provides new insight into common explanation methods that we analyze as case studies.