Towards Self-Explainability of Deep Neural Networks with Heatmap Captioning and Large-Language Models
This work addresses the problem of making AI explanations more interactive and scalable for users in computer vision, though it is incremental as it builds on existing heatmap techniques.
The paper tackles the lack of automation and accessibility in heatmap-based explainable AI by proposing a framework with context modeling and reasoning modules, using template-based captioning and large language models to generate explanations, with qualitative experiments showing effectiveness.
Heatmaps are widely used to interpret deep neural networks, particularly for computer vision tasks, and the heatmap-based explainable AI (XAI) techniques are a well-researched topic. However, most studies concentrate on enhancing the quality of the generated heatmap or discovering alternate heatmap generation techniques, and little effort has been devoted to making heatmap-based XAI automatic, interactive, scalable, and accessible. To address this gap, we propose a framework that includes two modules: (1) context modelling and (2) reasoning. We proposed a template-based image captioning approach for context modelling to create text-based contextual information from the heatmap and input data. The reasoning module leverages a large language model to provide explanations in combination with specialised knowledge. Our qualitative experiments demonstrate the effectiveness of our framework and heatmap captioning approach. The code for the proposed template-based heatmap captioning approach will be publicly available.