CVMar 7, 2024

ComFe: An Interpretable Head for Vision Transformers

arXiv:2403.04125v62 citationsh-index: 28Has CodeTrans. Mach. Learn. Res.
Originality Incremental advance
AI Analysis

This work addresses the need for scalable and efficient interpretable models in computer vision, offering a practical solution for applications requiring transparency without sacrificing performance.

The authors tackled the problem of interpretable computer vision models by introducing ComFe, an interpretable head for Vision Transformers that achieves competitive performance compared to non-interpretable methods on large-scale datasets like ImageNet-1K, with improved robustness and consistent hyperparameters.

Interpretable computer vision models explain their classifications through comparing the distances between the local embeddings of an image and a set of prototypes that represent the training data. However, these approaches introduce additional hyper-parameters that need to be tuned to apply to new datasets, scale poorly, and are more computationally intensive to train in comparison to black-box approaches. In this work, we introduce Component Features (ComFe), a highly scalable interpretable-by-design image classification head for pretrained Vision Transformers (ViTs) that can obtain competitive performance in comparison to comparable non-interpretable methods. To our knowledge, ComFe is the first interpretable head and unlike other interpretable approaches can be readily applied to large-scale datasets such as ImageNet-1K. Additionally, ComFe provides improved robustness and outperforms previous interpretable approaches on key benchmark datasets while using a consistent set of hyperparameters and without finetuning the pretrained ViT backbone. With only global image labels and no segmentation or part annotations, ComFe can identify consistent component features within an image and determine which of these features are informative in making a prediction. Code is available at github.com/emannix/comfe-component-features.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes