CVAIOct 23, 2025

3DReasonKnee: Advancing Grounded Reasoning in Medical Vision Language Models

arXiv:2510.20967v13 citationsh-index: 38Has CodePac Symp Biocomput Pac Symp Biocomput
Originality Incremental advance
AI Analysis

This addresses the need for clinically aligned, trustworthy AI in medical diagnostics by providing a dataset to improve 3D reasoning in VLMs, though it is incremental as it focuses on a specific domain and dataset creation.

The paper tackles the problem of Vision-Language Models struggling to ground anatomical regions in 3D medical images and reason step-by-step, by introducing 3DReasonKnee, a dataset with 494k quintuples from 7,970 3D knee MRI volumes, which includes clinician-generated reasoning steps and severity assessments, and benchmarks five state-of-the-art VLMs on localization and diagnostic accuracy.

Current Vision-Language Models (VLMs) struggle to ground anatomical regions in 3D medical images and reason about them in a step-by-step manner, a key requirement of real-world diagnostic assessment. This ability is essential for aligning model outputs with the diagnostic workflows clinicians use in practice, enabling trustworthy clinician-AI collaboration. Existing 3D datasets provide localization labels, but none support this "grounded reasoning" ability. To address this gap, we introduce 3DReasonKnee, the first 3D grounded reasoning dataset for medical images, which provides 494k high-quality quintuples derived from 7,970 3D knee MRI volumes. Each quintuple includes: (1) the 3D MRI volume, (2) a diagnostic question targeting a specific anatomical region (3) a 3D bounding box localizing the relevant anatomical structures, (4) clinician-generated diagnostic reasoning steps that explicitly detail the 3D reasoning process, and (5) structured severity assessments for the relevant anatomical region. The creation and validation of 3DReasonKnee, involving over 450 hours of expert clinician time for manually segmenting MRIs and generating reasoning chains, ensures its superior quality and clinical relevance. We establish ReasonKnee-Bench to evaluate localization and diagnostic accuracy, providing insight into VLM ability to perform grounding and severity assessment across anatomical regions and diagnostic inquiries. We benchmark five state-of-the-art VLMs, providing baseline performance for ReasonKnee-Bench. By providing this unique resource of expert-annotated 3D reasoning pathways, 3DReasonKnee serves as a repository of orthopedic surgeons' diagnostic expertise and offers a vital testbed for advancing multimodal medical AI systems towards 3D, clinically aligned, localized decision-making capabilities. The dataset can be found in: https://huggingface.co/datasets/rajpurkarlab/3DReasonKnee

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes