CVAIApr 22, 2024

Automatic Discovery of Visual Circuits

arXiv:2404.14349v112 citationsh-index: 8
Originality Highly original
AI Analysis

This addresses the need for scalable interpretability in vision models, offering a novel approach to reduce human labor in understanding model computations.

The paper tackles the problem of automatically discovering interpretable computational subgraphs in deep vision models, introducing a method that uses example-based concept specification and functional connectivity tracing to extract circuits that causally affect model output and can defend against adversarial attacks.

To date, most discoveries of network subcomponents that implement human-interpretable computations in deep vision models have involved close study of single units and large amounts of human labor. We explore scalable methods for extracting the subgraph of a vision model's computational graph that underlies recognition of a specific visual concept. We introduce a new method for identifying these subgraphs: specifying a visual concept using a few examples, and then tracing the interdependence of neuron activations across layers, or their functional connectivity. We find that our approach extracts circuits that causally affect model output, and that editing these circuits can defend large pretrained models from adversarial attacks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes