CVLGMay 23, 2025

U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding

arXiv:2505.17779v26 citationsh-index: 16
Originality Synthesis-oriented
AI Analysis

This work addresses the need for standardized evaluation of LVLMs in medical ultrasound, a critical healthcare domain, but is incremental as it applies existing benchmarking approaches to a new modality.

The authors tackled the problem of evaluating large vision-language models (LVLMs) on ultrasound understanding, which is challenging due to image quality issues and lack of prior benchmarks, by introducing U2-BENCH, a comprehensive benchmark with 7,241 cases across 15 anatomical regions and 8 tasks, revealing strong performance in image-level classification but persistent challenges in spatial reasoning and clinical language generation.

Ultrasound is a widely-used imaging modality critical to global healthcare, yet its interpretation remains challenging due to its varying image quality on operators, noises, and anatomical structures. Although large vision-language models (LVLMs) have demonstrated impressive multimodal capabilities across natural and medical domains, their performance on ultrasound remains largely unexplored. We introduce U2-BENCH, the first comprehensive benchmark to evaluate LVLMs on ultrasound understanding across classification, detection, regression, and text generation tasks. U2-BENCH aggregates 7,241 cases spanning 15 anatomical regions and defines 8 clinically inspired tasks, such as diagnosis, view recognition, lesion localization, clinical value estimation, and report generation, across 50 ultrasound application scenarios. We evaluate 20 state-of-the-art LVLMs, both open- and closed-source, general-purpose and medical-specific. Our results reveal strong performance on image-level classification, but persistent challenges in spatial reasoning and clinical language generation. U2-BENCH establishes a rigorous and unified testbed to assess and accelerate LVLM research in the uniquely multimodal domain of medical ultrasound imaging.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes