CV AIApr 22, 2024

Automatic Discovery of Visual Circuits

Achyuta Rajaram, Neil Chowdhury, Antonio Torralba, Jacob Andreas, Sarah Schwettmann

arXiv:2404.14349v114.712 citationsh-index: 8Has Code

Originality Highly original

AI Analysis

This addresses the need for scalable interpretability in vision models, offering a novel approach to reduce human labor in understanding model computations.

The paper tackles the problem of automatically discovering interpretable computational subgraphs in deep vision models, introducing a method that uses example-based concept specification and functional connectivity tracing to extract circuits that causally affect model output and can defend against adversarial attacks.

To date, most discoveries of network subcomponents that implement human-interpretable computations in deep vision models have involved close study of single units and large amounts of human labor. We explore scalable methods for extracting the subgraph of a vision model's computational graph that underlies recognition of a specific visual concept. We introduce a new method for identifying these subgraphs: specifying a visual concept using a few examples, and then tracing the interdependence of neuron activations across layers, or their functional connectivity. We find that our approach extracts circuits that causally affect model output, and that editing these circuits can defend large pretrained models from adversarial attacks.

View on arXiv PDF Code

Similar