CVApr 15

MApLe: Multi-instance Alignment of Diagnostic Reports and Large Medical Images

arXiv:2604.1397054.9h-index: 3Has Code
Predicted impact top 64% in CV · last 90 daysOriginality Incremental advance
AI Analysis

For medical imaging and radiology, MApLe addresses the bottleneck of linking subtle pathological findings in text to small image regions, improving report-image alignment.

MApLe improves alignment between diagnostic reports and large medical images by disentangling anatomical regions and diagnostic findings, achieving better performance than state-of-the-art baselines on downstream tasks.

In diagnostic reports, experts encode complex imaging data into clinically actionable information. They describe subtle pathological findings that are meaningful in their anatomical context. Reports follow relatively consistent structures, expressing diagnostic information with few words that are often associated with tiny but consequential image observations. Standard vision language models struggle to identify the associations between these informative text components and small locations in the images. Here, we propose "MApLe", a multi-task, multi-instance vision language alignment approach that overcomes these limitations. It disentangles the concepts of anatomical region and diagnostic finding, and links local image information to sentences in a patch-wise approach. Our method consists of a text embedding trained to capture anatomical and diagnostic concepts in sentences, a patch-wise image encoder conditioned on anatomical structures, and a multi-instance alignment of these representations. We demonstrate that MApLe can successfully align different image regions and multiple diagnostic findings in free-text reports. We show that our model improves the alignment performance compared to state-of-the-art baseline models when evaluated on several downstream tasks. The code is available at https://github.com/cirmuw/MApLe.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes