Measuring Agreeableness Bias in Multimodal Models
This reveals a reliability issue for users of multimodal models in critical decision-making contexts where visual cues might be present, though it is incremental as it quantifies a known phenomenon.
This paper tackles the problem of multimodal language models being influenced by pre-marked options in images, finding that models shift their responses toward these options even when incorrect, with significant and consistent bias across architectures.
This paper examines a phenomenon in multimodal language models where pre-marked options in question images can significantly influence model responses. Our study employs a systematic methodology to investigate this effect: we present models with images of multiple-choice questions, which they initially answer correctly, then expose the same model to versions with pre-marked options. Our findings reveal a significant shift in the models' responses towards the pre-marked option, even when it contradicts their answers in the neutral settings. Comprehensive evaluations demonstrate that this agreeableness bias is a consistent and quantifiable behavior across various model architectures. These results show potential limitations in the reliability of these models when processing images with pre-marked options, raising important questions about their application in critical decision-making contexts where such visual cues might be present.