AIMar 26

Sparse Visual Thought Circuits in Vision-Language Models

arXiv:2603.2507555.6
Predicted impact top 68% in AI · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the problem of unreliable control in vision-language models for researchers and practitioners, revealing limitations in feature composability that are incremental to existing interpretability methods.

The study tested the modularity hypothesis of sparse autoencoder features in vision-language models and found that while intervening on task-selective features can slightly improve reasoning accuracy, combining feature sets often causes output drift and accuracy degradation due to shared internal pathways.

Sparse autoencoders (SAEs) improve interpretability in multimodal models, but it remains unclear whether SAE features form modular, composable units for reasoning-an assumption underlying many intervention-based steering methods. We test this modularity hypothesis and find it often fails: intervening on a task-selective feature set can modestly improve reasoning accuracy, while intervening on the union of two such sets reliably induces output drift (large unintended changes in predictions) and degrades accuracy, even under norm-matched perturbations. This non modular circuit interference is consistent with shared internal pathways where feature unions amplify activation shifts. We develop a reproducible causal pipeline to localize and test these sparse visual thought circuits in Qwen3-VL-8B. On a controlled synthetic benchmark with seven task types and three difficulty levels, linear probes identify a mid decoder locus for task type information. We train SAEs at this layer, construct task-selective sets via an explicit rule, and perform inference time scaling and ablation while quantifying accuracy and drift. Our findings-validated with bootstrapped subsamples and permutation controls, and replicated across multiple VLM families and five diverse datasets clarify the boundaries of SAE feature composability and provide a rigorous diagnostic framework for more reliable VLM control.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes