Conflict Adaptation in Vision-Language Models
This work addresses the problem of understanding cognitive control mechanisms in AI models for researchers in cognitive science and AI, but it is incremental as it applies known human phenomena to VLMs.
The study investigated whether vision-language models (VLMs) exhibit conflict adaptation, a human cognitive control phenomenon, using a sequential Stroop task, finding that 12 of 13 VLMs showed behavior consistent with it, and identified task-relevant supernodes in InternVL 3.5 4B, with ablation of a conflict-modulated supernode increasing Stroop errors.
A signature of human cognitive control is conflict adaptation: improved performance on a high-conflict trial following another high-conflict trial. This phenomenon offers an account for how cognitive control, a scarce resource, is recruited. Using a sequential Stroop task, we find that 12 of 13 vision-language models (VLMs) tested exhibit behavior consistent with conflict adaptation, with the lone exception likely reflecting a ceiling effect. To understand the representational basis of this behavior, we use sparse autoencoders (SAEs) to identify task-relevant supernodes in InternVL 3.5 4B. Partially overlapping supernodes emerge for text and color in both early and late layers, and their relative sizes mirror the automaticity asymmetry between reading and color naming in humans. We further isolate a conflict-modulated supernode in layers 24-25 whose ablation significantly increases Stroop errors while minimally affecting congruent trials.