Counterfactual Gradients-based Quantification of Prediction Trust in Neural Networks
This work addresses the need for objective trust quantification in deep learning for applications like image and video recognition, though it appears incremental as it builds on existing gradient-based methods.
The paper tackles the problem of quantifying prediction trust in neural networks by proposing GradTrust, a method using counterfactual gradients variance, and shows it outperforms existing techniques in detecting misprediction rates on ImageNet and Kinetics-400 datasets, achieving Top-2 performance in 37 out of 38 experimental modalities.
The widespread adoption of deep neural networks in machine learning calls for an objective quantification of esoteric trust. In this paper we propose GradTrust, a classification trust measure for large-scale neural networks at inference. The proposed method utilizes variance of counterfactual gradients, i.e. the required changes in the network parameters if the label were different. We show that GradTrust is superior to existing techniques for detecting misprediction rates on $50000$ images from ImageNet validation dataset. Depending on the network, GradTrust detects images where either the ground truth is incorrect or ambiguous, or the classes are co-occurring. We extend GradTrust to Video Action Recognition on Kinetics-400 dataset. We showcase results on $14$ architectures pretrained on ImageNet and $5$ architectures pretrained on Kinetics-400. We observe the following: (i) simple methodologies like negative log likelihood and margin classifiers outperform state-of-the-art uncertainty and out-of-distribution detection techniques for misprediction rates, and (ii) the proposed GradTrust is in the Top-2 performing methods on $37$ of the considered $38$ experimental modalities. The code is available at: https://github.com/olivesgatech/GradTrust