CVAIApr 23, 2024

Grounded Knowledge-Enhanced Medical Vision-Language Pre-training for Chest X-Ray

arXiv:2404.14750v23 citationsh-index: 9Biomedical Signal Processing and Control
Originality Incremental advance
AI Analysis

This work addresses biases and alignment issues in medical AI for chest X-ray analysis, offering incremental improvements over existing methods.

The authors tackled the problem of redundant information in medical vision-language pre-training for chest X-rays by proposing a grounded knowledge-enhanced framework, which achieved competitive or state-of-the-art performance on tasks like disease classification, localization, report generation, and visual question-answering.

Medical foundation models have the potential to revolutionize healthcare by providing robust and generalized representations of medical data. Medical vision-language pre-training has emerged as a promising approach for learning domain-general representations of medical image and text. Current algorithms that exploit global and local alignment between medical image and text could however be marred by redundant information in medical data. To address this issue, we propose a grounded knowledge-enhanced medical vision-language pre-training (GK-MVLP) framework for chest X-ray. In this framework, medical knowledge was grounded to the appropriate anatomical regions by using a transformer-based grounded knowledge-enhanced module for fine-grained alignment between textural features of medical knowledge and the corresponding anatomical region-level visual features. The performance of GK-MVLP was competitive with or exceeded the state of the art on downstream image understanding tasks (chest X-ray disease classification, disease localization), generative task (report generation), and vision-language understanding task (medical visual question-answering). Our results demonstrate the advantage of incorporating grounding mechanism to remove biases and improve the alignment between chest X-ray image and radiology report.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes