CVAug 3, 2019

Smooth Grad-CAM++: An Enhanced Inference Level Visualization Technique for Deep Convolutional Neural Network Models

Daniel Omeiza, Skyler Speakman, Celia Cintas, Komminist Weldermariam

arXiv:1908.01224v126.2285 citations

Originality Synthesis-oriented

AI Analysis

This is an incremental improvement for computer vision researchers and decision makers seeking explainable AI, addressing specific limitations in visualization techniques.

The paper tackled the problem of poor localization and visual sharpness in existing CNN visualization methods like Grad-CAM, which struggle with multiple object occurrences and incomplete object capture. The result was Smooth Grad-CAM++, which produced more visually sharp maps and better object localization in experiments.

Gaining insight into how deep convolutional neural network models perform image classification and how to explain their outputs have been a concern to computer vision researchers and decision makers. These deep models are often referred to as black box due to low comprehension of their internal workings. As an effort to developing explainable deep learning models, several methods have been proposed such as finding gradients of class output with respect to input image (sensitivity maps), class activation map (CAM), and Gradient based Class Activation Maps (Grad-CAM). These methods under perform when localizing multiple occurrences of the same class and do not work for all CNNs. In addition, Grad-CAM does not capture the entire object in completeness when used on single object images, this affect performance on recognition tasks. With the intention to create an enhanced visual explanation in terms of visual sharpness, object localization and explaining multiple occurrences of objects in a single image, we present Smooth Grad-CAM++ \footnote{Simple demo: http://35.238.22.135:5000/}, a technique that combines methods from two other recent techniques---SMOOTHGRAD and Grad-CAM++. Our Smooth Grad-CAM++ technique provides the capability of either visualizing a layer, subset of feature maps, or subset of neurons within a feature map at each instance at the inference level (model prediction process). After experimenting with few images, Smooth Grad-CAM++ produced more visually sharp maps with better localization of objects in the given input images when compared with other methods.

View on arXiv PDF

Similar