LGMay 19, 2022

Cardinality-Minimal Explanations for Monotonic Neural Networks

Ouns El Harzli, Bernardo Cuenca Grau, Ian Horrocks

Oxford

arXiv:2205.09901v311.811 citationsh-index: 91

Originality Incremental advance

AI Analysis

This work addresses the challenge of providing precise explanations for neural model predictions, which is crucial for interpretability in AI applications, though it is incremental as it builds on existing explanation methods by adding constraints for tractability.

The paper tackles the intractability of computing minimal feature subsets for explaining neural network predictions by focusing on monotonic neural networks, showing that with additional assumptions on activation functions, these problems become solvable in polynomial time using greedy algorithms, with experiments indicating favorable performance.

In recent years, there has been increasing interest in explanation methods for neural model predictions that offer precise formal guarantees. These include abductive (respectively, contrastive) methods, which aim to compute minimal subsets of input features that are sufficient for a given prediction to hold (respectively, to change a given prediction). The corresponding decision problems are, however, known to be intractable. In this paper, we investigate whether tractability can be regained by focusing on neural models implementing a monotonic function. Although the relevant decision problems remain intractable, we can show that they become solvable in polynomial time by means of greedy algorithms if we additionally assume that the activation functions are continuous everywhere and differentiable almost everywhere. Our experiments suggest favourable performance of our algorithms.

View on arXiv PDF

Similar