Pathological Truth Bias in Vision-Language Models
This addresses trust issues in VLMs for real-world applications by revealing hidden biases, though it is incremental as it builds on existing audit methods.
The paper tackled the problem of systematic failures in vision-language models (VLMs) where they incorrectly agree with visually contradicted statements, introducing MATS to measure this and finding that instruction-tuned generative VLMs like LLaVA 1.5 and QwenVLchat perform poorly, while contrastive encoders like CLIP and SigLIP are more robust, with activation patching identifying failure loci for potential repairs.
Vision Language Models (VLMs) are improving quickly, but standard benchmarks can hide systematic failures that reduce real world trust. We introduce MATS (Multimodal Audit for Truthful Spatialization), a compact behavioral audit that measures whether models reject visually contradicted statements, and two metrics Spatial Consistency Score (SCS) and Incorrect Agreement Rate (IAR). Instruction tuned generative VLMs (LLaVA 1.5, QwenVLchat) exhibit very low SCS and high IAR, while contrastive encoders (CLIP, SigLIP) are far more robust. Activation patching causally localizes failure loci (mid to late cross attention for generative models, pooled projection components for contrastive models) and suggests concrete repair paths.