CVLGDec 6, 2024

Sparse autoencoders reveal selective remapping of visual concepts during adaptation

arXiv:2412.05276v250 citationsh-index: 10ICLR
Originality Incremental advance
AI Analysis

This work provides insights into adaptation mechanisms for machine learning practitioners, but it is incremental as it builds on existing methods to analyze known bottlenecks.

The researchers tackled the problem of understanding how foundation models adapt to downstream tasks by developing PatchSAE, a sparse autoencoder for CLIP vision transformers, to extract interpretable visual concepts like shape and color, and found that most adaptation gains come from existing concepts rather than new ones.

Adapting foundation models for specific purposes has become a standard approach to build machine learning systems for downstream applications. Yet, it is an open question which mechanisms take place during adaptation. Here we develop a new Sparse Autoencoder (SAE) for the CLIP vision transformer, named PatchSAE, to extract interpretable concepts at granular levels (e.g., shape, color, or semantics of an object) and their patch-wise spatial attributions. We explore how these concepts influence the model output in downstream image classification tasks and investigate how recent state-of-the-art prompt-based adaptation techniques change the association of model inputs to these concepts. While activations of concepts slightly change between adapted and non-adapted models, we find that the majority of gains on common adaptation tasks can be explained with the existing concepts already present in the non-adapted foundation model. This work provides a concrete framework to train and use SAEs for Vision Transformers and provides insights into explaining adaptation mechanisms.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes