Mathematical Foundation of Interpretable Equivariant Surrogate Models
This work provides a foundational approach to interpretability in machine learning, which is incremental as it builds on existing equivariant operator theories.
The paper tackles the problem of neural network explainability by introducing a mathematical framework for quantifying distances between equivariant operators and defining interpretability based on user preferences, and demonstrates its application in image classification with convolutional neural networks.
This paper introduces a rigorous mathematical framework for neural network explainability, and more broadly for the explainability of equivariant operators called Group Equivariant Operators (GEOs) based on Group Equivariant Non-Expansive Operators (GENEOs) transformations. The central concept involves quantifying the distance between GEOs by measuring the non-commutativity of specific diagrams. Additionally, the paper proposes a definition of interpretability of GEOs according to a complexity measure that can be defined according to each user preferences. Moreover, we explore the formal properties of this framework and show how it can be applied in classical machine learning scenarios, like image classification with convolutional neural networks.