Hidden Monotonicity: Explaining Deep Neural Networks via their DC Decomposition
This work addresses the problem of improving explainability in deep learning for practitioners and researchers, offering incremental advancements by building on existing monotonicity concepts to enhance interpretability methods.
The paper tackles the challenge of explaining deep neural networks by leveraging monotonicity, proposing two methods: an adapted decomposition of ReLU networks into monotone convex parts to improve saliency methods, and training models as differences between monotone networks for self-explainability, achieving state-of-the-art results on ImageNet-S with VGG16 and Resnet18 across all Quantus saliency metrics.
It has been demonstrated in various contexts that monotonicity leads to better explainability in neural networks. However, not every function can be well approximated by a monotone neural network. We demonstrate that monotonicity can still be used in two ways to boost explainability. First, we use an adaptation of the decomposition of a trained ReLU network into two monotone and convex parts, thereby overcoming numerical obstacles from an inherent blowup of the weights in this procedure. Our proposed saliency methods -- SplitCAM and SplitLRP -- improve on state of the art results on both VGG16 and Resnet18 networks on ImageNet-S across all Quantus saliency metric categories. Second, we exhibit that training a model as the difference between two monotone neural networks results in a system with strong self-explainability properties.