CVAILGApr 10, 2022

Explaining Deep Convolutional Neural Networks via Latent Visual-Semantic Filter Attention

NVIDIAU of Toronto
arXiv:2204.04601v120 citationsh-index: 23
Originality Incremental advance
AI Analysis

This work addresses interpretability for researchers and users of visual models, offering a general framework for explaining learned representations, though it is incremental as it builds on existing mapping techniques.

The authors tackled the challenge of generating semantic explanations for deep convolutional neural networks without direct supervision, proposing LaViSE to teach any CNN to produce text descriptions of its latent filter representations, and demonstrated its ability to generate novel descriptions beyond training categories and perform unsupervised dataset bias analysis.

Interpretability is an important property for visual models as it helps researchers and users understand the internal mechanism of a complex model. However, generating semantic explanations about the learned representation is challenging without direct supervision to produce such explanations. We propose a general framework, Latent Visual Semantic Explainer (LaViSE), to teach any existing convolutional neural network to generate text descriptions about its own latent representations at the filter level. Our method constructs a mapping between the visual and semantic spaces using generic image datasets, using images and category names. It then transfers the mapping to the target domain which does not have semantic labels. The proposed framework employs a modular structure and enables to analyze any trained network whether or not its original training data is available. We show that our method can generate novel descriptions for learned filters beyond the set of categories defined in the training dataset and perform an extensive evaluation on multiple datasets. We also demonstrate a novel application of our method for unsupervised dataset bias analysis which allows us to automatically discover hidden biases in datasets or compare different subsets without using additional labels. The dataset and code are made public to facilitate further research.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes