CVLGFeb 4

Visual concept ranking uncovers medical shortcuts used by large multimodal models

arXiv:2602.05096v1
Originality Incremental advance
AI Analysis

This addresses the reliability of LMMs in safety-critical medical domains, though it is incremental as it builds on existing auditing methods.

The paper tackled the problem of auditing large multimodal models (LMMs) in healthcare by introducing Visual Concept Ranking (VCR) to identify important visual concepts, revealing unexpected performance gaps across demographic subgroups in skin lesion classification and validating hypotheses with manual interventions.

Ensuring the reliability of machine learning models in safety-critical domains such as healthcare requires auditing methods that can uncover model shortcomings. We introduce a method for identifying important visual concepts within large multimodal models (LMMs) and use it to investigate the behaviors these models exhibit when prompted with medical tasks. We primarily focus on the task of classifying malignant skin lesions from clinical dermatology images, with supplemental experiments including both chest radiographs and natural images. After showing how LMMs display unexpected gaps in performance between different demographic subgroups when prompted with demonstrating examples, we apply our method, Visual Concept Ranking (VCR), to these models and prompts. VCR generates hypotheses related to different visual feature dependencies, which we are then able to validate with manual interventions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes