CV LGDec 6, 2024

How to Squeeze An Explanation Out of Your Model

Tiago Roxo, Joana C. Costa, Pedro R. M. Inácio, Hugo Proença

arXiv:2412.05134v13.71 citationsh-index: 5ECCV Workshops

Originality Incremental advance

AI Analysis

This addresses the need for interpretability in sensitive applications such as biometrics and healthcare, though it is incremental as it builds on existing SE block techniques.

The paper tackles the problem of deep learning models lacking interpretability, especially in sensitive domains like biometrics, by proposing a model-agnostic approach using Squeeze and Excitation blocks to generate visual attention heatmaps, achieving competitive results with existing methods without compromising model performance.

Deep learning models are widely used nowadays for their reliability in performing various tasks. However, they do not typically provide the reasoning behind their decision, which is a significant drawback, particularly for more sensitive areas such as biometrics, security and healthcare. The most commonly used approaches to provide interpretability create visual attention heatmaps of regions of interest on an image based on models gradient backpropagation. Although this is a viable approach, current methods are targeted toward image settings and default/standard deep learning models, meaning that they require significant adaptations to work on video/multi-modal settings and custom architectures. This paper proposes an approach for interpretability that is model-agnostic, based on a novel use of the Squeeze and Excitation (SE) block that creates visual attention heatmaps. By including an SE block prior to the classification layer of any model, we are able to retrieve the most influential features via SE vector manipulation, one of the key components of the SE block. Our results show that this new SE-based interpretability can be applied to various models in image and video/multi-modal settings, namely biometrics of facial features with CelebA and behavioral biometrics using Active Speaker Detection datasets. Furthermore, our proposal does not compromise model performance toward the original task, and has competitive results with current interpretability approaches in state-of-the-art object datasets, highlighting its robustness to perform in varying data aside from the biometric context.

View on arXiv PDF

Similar