HCCLMAMar 26, 2025

3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark

arXiv:2504.13861v35 citationsh-index: 5Has CodeEMNLP
Originality Incremental advance
AI Analysis

This work addresses the problem of evaluating LVLMs in realistic medical consultations for researchers and practitioners, though it is incremental as it builds on existing LVLM and benchmark methods.

The paper tackles the underexplored ability of Large Vision-Language Models (LVLMs) in complex telemedicine consultations by introducing 3MDBench, a benchmark for simulating and evaluating LVLM-driven dialogues, showing that multimodal dialogue improves F1 score by 6.5% and integrating a diagnostic CNN boosts F1 by up to 20%.

Though Large Vision-Language Models (LVLMs) are being actively explored in medicine, their ability to conduct complex real-world telemedicine consultations combining accurate diagnosis with professional dialogue remains underexplored. This paper presents 3MDBench (Medical Multimodal Multi-agent Dialogue Benchmark), an open-source framework for simulating and evaluating LVLM-driven telemedical consultations. 3MDBench simulates patient variability through temperament-based Patient Agent and evaluates diagnostic accuracy and dialogue quality via Assessor Agent. It includes 2996 cases across 34 diagnoses from real-world telemedicine interactions, combining textual and image-based data. The experimental study compares diagnostic strategies for widely used open and closed-source LVLMs. We demonstrate that multimodal dialogue with internal reasoning improves F1 score by 6.5% over non-dialogue settings, highlighting the importance of context-aware, information-seeking questioning. Moreover, injecting predictions from a diagnostic convolutional neural network into the LVLM's context boosts F1 by up to 20%. Source code is available at https://github.com/univanxx/3mdbench.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes