Categorical Foundations of Explainable AI: A Unifying Theory
This foundational work addresses the need for safe and reliable AI systems by offering a unifying theory that could impact the entire field of XAI, though it is incremental in providing formal definitions rather than new empirical results.
The paper tackles the lack of mathematical formalization in Explainable AI (XAI) by providing the first rigorous definitions of key XAI notions, such as 'explanation', using Category theory, resulting in a framework that models learning schemes and establishes a theoretical basis for XAI taxonomies.
Explainable AI (XAI) aims to address the human need for safe and reliable AI systems. However, numerous surveys emphasize the absence of a sound mathematical formalization of key XAI notions -- remarkably including the term "explanation" which still lacks a precise definition. To bridge this gap, this paper presents the first mathematically rigorous definitions of key XAI notions and processes, using the well-funded formalism of Category theory. We show that our categorical framework allows to: (i) model existing learning schemes and architectures, (ii) formally define the term "explanation", (iii) establish a theoretical basis for XAI taxonomies, and (iv) analyze commonly overlooked aspects of explaining methods. As a consequence, our categorical framework promotes the ethical and secure deployment of AI technologies as it represents a significant step towards a sound theoretical foundation of explainable AI.