CVAINov 7, 2023

A Simple Interpretable Transformer for Fine-Grained Image Classification and Analysis

Microsoft
arXiv:2311.04157v327 citationsh-index: 42Has Code
Originality Incremental advance
AI Analysis

This work addresses the need for interpretability in image classification, particularly for fine-grained analysis, but it is incremental as it builds on existing Transformer and DETR frameworks.

The authors tackled the problem of making image classification interpretable by introducing a proactive approach where each class searches for itself in an image using a Transformer encoder-decoder, resulting in a method that provides faithful interpretations and is demonstrated on eight datasets for fine-grained classification.

We present a novel usage of Transformers to make image classification interpretable. Unlike mainstream classifiers that wait until the last fully connected layer to incorporate class information to make predictions, we investigate a proactive approach, asking each class to search for itself in an image. We realize this idea via a Transformer encoder-decoder inspired by DEtection TRansformer (DETR). We learn "class-specific" queries (one for each class) as input to the decoder, enabling each class to localize its patterns in an image via cross-attention. We name our approach INterpretable TRansformer (INTR), which is fairly easy to implement and exhibits several compelling properties. We show that INTR intrinsically encourages each class to attend distinctively; the cross-attention weights thus provide a faithful interpretation of the prediction. Interestingly, via "multi-head" cross-attention, INTR could identify different "attributes" of a class, making it particularly suitable for fine-grained classification and analysis, which we demonstrate on eight datasets. Our code and pre-trained models are publicly accessible at the Imageomics Institute GitHub site: https://github.com/Imageomics/INTR.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes