CVApr 4, 2024

DeViDe: Faceted medical knowledge for improved medical vision-language pre-training

arXiv:2404.03618v111 citationsh-index: 69
Originality Incremental advance
AI Analysis

This work addresses a gap in medical AI by improving knowledge representation for chest X-ray analysis, though it is incremental as it builds on existing vision-language pre-training methods.

The paper tackles the challenge of effectively encoding medical knowledge in vision-language pre-training for chest X-rays by proposing DeViDe, a transformer-based method that integrates radiographic descriptions from the web with abstract definitions and radiology reports, resulting in state-of-the-art performance on three large-scale datasets and superior results in downstream tasks.

Vision-language pre-training for chest X-rays has made significant strides, primarily by utilizing paired radiographs and radiology reports. However, existing approaches often face challenges in encoding medical knowledge effectively. While radiology reports provide insights into the current disease manifestation, medical definitions (as used by contemporary methods) tend to be overly abstract, creating a gap in knowledge. To address this, we propose DeViDe, a novel transformer-based method that leverages radiographic descriptions from the open web. These descriptions outline general visual characteristics of diseases in radiographs, and when combined with abstract definitions and radiology reports, provide a holistic snapshot of knowledge. DeViDe incorporates three key features for knowledge-augmented vision language alignment: First, a large-language model-based augmentation is employed to homogenise medical knowledge from diverse sources. Second, this knowledge is aligned with image information at various levels of granularity. Third, a novel projection layer is proposed to handle the complexity of aligning each image with multiple descriptions arising in a multi-label setting. In zero-shot settings, DeViDe performs comparably to fully supervised models on external datasets and achieves state-of-the-art results on three large-scale datasets. Additionally, fine-tuning DeViDe on four downstream tasks and six segmentation tasks showcases its superior performance across data from diverse distributions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes