CVAICYHCROMay 8, 2025

Mapping User Trust in Vision Language Models: Research Landscape, Challenges, and Prospects

arXiv:2505.05318v13 citationsh-index: 14
Originality Synthesis-oriented
AI Analysis

It addresses the need to inform and protect users in VLM interactions, but is incremental as a survey.

This survey reviews research on user trust in Vision Language Models (VLMs), analyzing trust dynamics through a multi-disciplinary taxonomy and proposing preliminary requirements for future studies based on literature and user workshops.

The rapid adoption of Vision Language Models (VLMs), pre-trained on large image-text and video-text datasets, calls for protecting and informing users about when to trust these systems. This survey reviews studies on trust dynamics in user-VLM interactions, through a multi-disciplinary taxonomy encompassing different cognitive science capabilities, collaboration modes, and agent behaviours. Literature insights and findings from a workshop with prospective VLM users inform preliminary requirements for future VLM trust studies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes