CVAIApr 17

Beyond a Single Frame: Multi-Frame Spatially Grounded Reasoning Across Volumetric MRI

Georgia Tech
arXiv:2604.1580817.2h-index: 10
Predicted impact top 35% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This benchmark addresses the lack of volumetric, spatially grounded evaluation for medical VLMs, enabling assessment of multi-frame reasoning across clinical imaging.

The authors introduce SGMRI-VQA, a 41,307-pair benchmark for multi-frame spatially grounded reasoning on volumetric MRI, and show that supervised fine-tuning of Qwen3-VL-8B with bounding box supervision improves grounding performance over zero-shot baselines.

Spatial reasoning and visual grounding are core capabilities for vision-language models (VLMs), yet most medical VLMs produce predictions without transparent reasoning or spatial evidence. Existing benchmarks also evaluate VLMs on isolated 2D images, overlooking the volumetric nature of clinical imaging, where findings can span multiple frames or appear on only a few slices. We introduce Spatially Grounded MRI Visual Question Answering (SGMRI-VQA), a 41,307-pair benchmark for multi-frame, spatially grounded reasoning on volumetric MRI. Built from expert radiologist annotations in the fastMRI+ dataset across brain and knee studies, each QA pair includes a clinician-aligned chain-of-thought trace with frame-indexed bounding box coordinates. Tasks are organized hierarchically across detection, localization, counting/classification, and captioning, requiring models to jointly reason about what is present, where it is, and across which frames it extends. We benchmark 10 VLMs and show that supervised fine-tuning of Qwen3-VL-8B with bounding box supervision consistently improves grounding performance over strong zero-shot baselines, indicating that targeted spatial supervision is an effective path toward grounded clinical reasoning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes