Explaining the Impact of Training on Vision Models via Activation Clustering
This provides a tool for post-hoc inspection and transparency in vision models, addressing interpretability issues for researchers and practitioners, though it is incremental as it builds on existing activation analysis methods.
The paper tackles the problem of understanding internal representations in vision models by introducing NAVE, a method that clusters feature activations to extract and visualize learned semantics without fine-tuning, showing that its concepts align with image semantics and revealing how training strategies affect performance.
This paper introduces Neuro-Activated Vision Explanations (NAVE), a method for extracting and visualizing the internal representations of vision model encoders. By clustering feature activations, NAVE provides insights into learned semantics without fine-tuning. Using object localization, we show that NAVE's concepts align with image semantics. Through extensive experiments, we analyze the impact of training strategies and architectures on encoder representation capabilities. Additionally, we apply NAVE to study training artifacts in vision transformers and reveal how weak training strategies and spurious correlations degrade model performance. Our findings establish NAVE as a valuable tool for post-hoc model inspection and improving transparency in vision models.