CVIVSep 9, 2025

Feature Space Analysis by Guided Diffusion Model

arXiv:2509.07936v2h-index: 19
Originality Incremental advance
AI Analysis

This addresses the interpretability problem for researchers and practitioners in computer vision by providing a tool to analyze feature spaces of various DNNs without additional training, though it is incremental as it builds on existing diffusion models.

The paper tackles the black-box nature of DNN feature extraction by proposing a decoder that generates images with features guaranteed to match user-specified ones, revealing which image attributes are encoded into those features, with experiments showing remarkably similar features in generated images for models like CLIP, ResNet-50, and vision transformers.

One of the key issues in Deep Neural Networks (DNNs) is the black-box nature of their internal feature extraction process. Targeting vision-related domains, this paper focuses on analysing the feature space of a DNN by proposing a decoder that can generate images whose features are guaranteed to closely match a user-specified feature. Owing to this guarantee that is missed in past studies, our decoder allows us to evidence which of various image attributes are encoded into the user-specified feature. Our decoder is implemented as a guided diffusion model that guides the reverse image generation of a pre-trained diffusion model to minimise the Euclidean distance between the feature of a clean image estimated at each step and the user-specified feature. One practical advantage of our decoder is that it can analyse feature spaces of different DNNs with no additional training and run on a single COTS GPU. The experimental results targeting CLIP's image encoder, ResNet-50 and vision transformer demonstrate that images generated by our decoder have features remarkably similar to the user-specified ones and reveal valuable insights into these DNNs' feature spaces.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes