Anatomical grounding pre-training for medical phrase grounding
This work addresses data scarcity for researchers and practitioners in medical imaging by providing a pre-training method that enhances MPG models, though it is incremental as it builds on existing pre-training paradigms.
The paper tackles the scarcity of annotated data in Medical Phrase Grounding (MPG) by proposing anatomical grounding as an in-domain pre-training task, which aligns anatomical terms with image regions using large-scale datasets. The fine-tuned model achieved state-of-the-art performance on MS-CXR with an mIoU of 61.2, significantly improving results in zero-shot and fine-tuning settings.
Medical Phrase Grounding (MPG) maps radiological findings described in medical reports to specific regions in medical images. The primary obstacle hindering progress in MPG is the scarcity of annotated data available for training and validation. We propose anatomical grounding as an in-domain pre-training task that aligns anatomical terms with corresponding regions in medical images, leveraging large-scale datasets such as Chest ImaGenome. Our empirical evaluation on MS-CXR demonstrates that anatomical grounding pre-training significantly improves performance in both a zero-shot learning and fine-tuning setting, outperforming state-of-the-art MPG models. Our fine-tuned model achieved state-of-the-art performance on MS-CXR with an mIoU of 61.2, demonstrating the effectiveness of anatomical grounding pre-training for MPG.