Circuit Tracing in Vision-Language Models: Understanding the Internal Mechanisms of Multimodal Thinking

Jingcheng Yang, Tianhu Xiong, Shengyi Qian, Klara Nahrstedt, Mingyuan Wu

arXiv:2602.20330v14.02 citationsh-index: 77

Originality Incremental advance

AI Analysis

This work addresses the opacity of VLMs for researchers and developers, providing a foundational tool for explainability, though it is incremental in advancing existing methods for interpretability.

The authors tackled the problem of understanding the internal mechanisms of vision-language models (VLMs) by introducing a framework for transparent circuit tracing, which reveals how VLMs hierarchically integrate visual and semantic concepts and identifies distinct circuits for tasks like mathematical reasoning.

Vision-language models (VLMs) are powerful but remain opaque black boxes. We introduce the first framework for transparent circuit tracing in VLMs to systematically analyze multimodal reasoning. By utilizing transcoders, attribution graphs, and attention-based methods, we uncover how VLMs hierarchically integrate visual and semantic concepts. We reveal that distinct visual feature circuits can handle mathematical reasoning and support cross-modal associations. Validated through feature steering and circuit patching, our framework proves these circuits are causal and controllable, laying the groundwork for more explainable and reliable VLMs.

View on arXiv PDF

Similar