CVCLApr 20

From Heads to Neurons: Causal Attribution and Steering in Multi-Task Vision-Language Models

arXiv:2604.1794130.7h-index: 6Has Code
Predicted impact top 17% in CV · last 90 daysOriginality Incremental advance
AI Analysis

For researchers working on interpretability and steering of multi-task vision-language models, this provides a method to identify and modulate task-critical neurons more accurately.

Existing neuron analysis in VLMs focuses on single tasks, overlooking task-dependent information pathways. HONES, a gradient-free framework, ranks FFN neurons by causal write-in contributions conditioned on task-relevant heads and steers them via scaling, outperforming existing methods on four multimodal tasks.

Recent work has increasingly explored neuron-level interpretation in vision-language models (VLMs) to identify neurons critical to final predictions. However, existing neuron analyses generally focus on single tasks, limiting the comparability of neuron importance across tasks. Moreover, ranking strategies tend to score neurons in isolation, overlooking how task-dependent information pathways shape the write-in effects of feed-forward network (FFN) neurons. This oversight can exacerbate neuron polysemanticity in multi-task settings, introducing noise into the identification and intervention of task-critical neurons. In this study, we propose HONES (Head-Oriented Neuron Explanation & Steering), a gradient-free framework for task-aware neuron attribution and steering in multi-task VLMs. HONES ranks FFN neurons by their causal write-in contributions conditioned on task-relevant attention heads, and further modulates salient neurons via lightweight scaling. Experiments on four diverse multimodal tasks and two popular VLMs show that HONES outperforms existing methods in identifying task-critical neurons and improves model performance after steering. Our source code is released at: https://github.com/petergit1/HONES.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes