CVAICLDec 19, 2025

RadImageNet-VQA: A Large-Scale CT and MRI Dataset for Radiologic Visual Question Answering

arXiv:2512.17396v12 citationsh-index: 11Has Code
Originality Synthesis-oriented
AI Analysis

This addresses the need for better radiologic VQA datasets for AI researchers, though it is incremental as it builds on existing VQA frameworks.

The authors tackled the problem of limited scale and text-based shortcuts in medical visual question answering (VQA) by introducing RadImageNet-VQA, a large-scale dataset with 750K CT and MRI images and 7.5M question-answer samples, which revealed that state-of-the-art models struggle with fine-grained pathology identification, performing near-random without image inputs.

In this work, we introduce RadImageNet-VQA, a large-scale dataset designed to advance radiologic visual question answering (VQA) on CT and MRI exams. Existing medical VQA datasets are limited in scale, dominated by X-ray imaging or biomedical illustrations, and often prone to text-based shortcuts. RadImageNet-VQA is built from expert-curated annotations and provides 750K images paired with 7.5M question-answer samples. It covers three key tasks - abnormality detection, anatomy recognition, and pathology identification - spanning eight anatomical regions and 97 pathology categories, and supports open-ended, closed-ended, and multiple-choice questions. Extensive experiments show that state-of-the-art vision-language models still struggle with fine-grained pathology identification, particularly in open-ended settings and even after fine-tuning. Text-only analysis further reveals that model performance collapses to near-random without image inputs, confirming that RadImageNet-VQA is free from linguistic shortcuts. The full dataset and benchmark are publicly available at https://huggingface.co/datasets/raidium/RadImageNet-VQA.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes