Anand Shah

h-index27

4papers

145citations

Novelty41%

AI Score32

Ranked #126,227 of 194,257 authors (top 65%)#41,831 in CV (top 71%)

4 Papers

15.7CVOct 11, 2023Code

IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-training

Che Liu, Sibo Cheng, Miaojing Shi et al.

In the field of medical Vision-Language Pre-training (VLP), significant efforts have been devoted to deriving text and image features from both clinical reports and associated medical images. However, most existing methods may have overlooked the opportunity in leveraging the inherent hierarchical structure of clinical reports, which are generally split into `findings' for descriptive content and `impressions' for conclusive observation. Instead of utilizing this rich, structured format, current medical VLP approaches often simplify the report into either a unified entity or fragmented tokens. In this work, we propose a novel clinical prior guided VLP framework named IMITATE to learn the structure information from medical reports with hierarchical vision-language alignment. The framework derives multi-level visual features from the chest X-ray (CXR) images and separately aligns these features with the descriptive and the conclusive text encoded in the hierarchical medical report. Furthermore, a new clinical-informed contrastive loss is introduced for cross-modal learning, which accounts for clinical prior knowledge in formulating sample correlations in contrastive learning. The proposed model, IMITATE, outperforms baseline VLP methods across six different datasets, spanning five medical imaging downstream tasks. Comprehensive experimental results highlight the advantages of integrating the hierarchical structure of medical reports for vision-language alignment. The code related to this paper is available at https://github.com/cheliu-computation/IMITATE-TMI2024.

12.6CVOct 10, 2023Code

Utilizing Synthetic Data for Medical Vision-Language Pre-training: Bypassing the Need for Real Images

Che Liu, Anand Shah, Wenjia Bai et al.

Medical Vision-Language Pre-training (VLP) learns representations jointly from medical images and paired radiology reports. It typically requires large-scale paired image-text datasets to achieve effective pre-training for both the image encoder and text encoder. The advent of text-guided generative models raises a compelling question: Can VLP be implemented solely with synthetic images generated from genuine radiology reports, thereby mitigating the need for extensively pairing and curating image-text datasets? In this work, we scrutinize this very question by examining the feasibility and effectiveness of employing synthetic images for medical VLP. We replace real medical images with their synthetic equivalents, generated from authentic medical reports. Utilizing three state-of-the-art VLP algorithms, we exclusively train on these synthetic samples. Our empirical evaluation across three subsequent tasks, namely image classification, semantic segmentation and object detection, reveals that the performance achieved through synthetic data is on par with or even exceeds that obtained with real images. As a pioneering contribution to this domain, we introduce a large-scale synthetic medical image dataset, paired with anonymized real radiology reports. This alleviates the need of sharing medical images, which are not easy to curate and share in practice. The code and the dataset can be found in \href{https://github.com/cheliu-computation/MedSyn-RepLearn/tree/main}{https://github.com/cheliu-computation/MedSyn-RepLearn/tree/main}.

21.5CVJul 17, 2023Code

M-FLAG: Medical Vision-Language Pre-training with Frozen Language Models and Latent Space Geometry Optimization

Che Liu, Sibo Cheng, Chen Chen et al.

Medical vision-language models enable co-learning and integrating features from medical imaging and clinical text. However, these models are not easy to train and the latent representation space can be complex. Here we propose a novel way for pre-training and regularising medical vision-language models. The proposed method, named Medical vision-language pre-training with Frozen language models and Latent spAce Geometry optimization (M-FLAG), leverages a frozen language model for training stability and efficiency and introduces a novel orthogonality loss to harmonize the latent space geometry. We demonstrate the potential of the pre-trained model on three downstream tasks: medical image classification, segmentation, and object detection. Extensive experiments across five public datasets demonstrate that M-FLAG significantly outperforms existing medical vision-language pre-training approaches and reduces the number of parameters by 78\%. Notably, M-FLAG achieves outstanding performance on the segmentation task while using only 1\% of the RSNA dataset, even outperforming ImageNet pre-trained models that have been fine-tuned using 100\% of the data.

2.0IVFeb 6, 2019

SAPSAM - Sparsely Annotated Pathological Sign Activation Maps - A novel approach to train Convolutional Neural Networks on lung CT scans using binary labels only

Mario Zusag, Sujal Desai, Marcello Di Paolo et al.

Chronic Pulmonary Aspergillosis (CPA) is a complex lung disease caused by infection with Aspergillus. Computed tomography (CT) images are frequently requested in patients with suspected and established disease, but the radiological signs on CT are difficult to quantify making accurate follow-up challenging. We propose a novel method to train Convolutional Neural Networks using only regional labels on the presence of pathological signs, to not only detect CPA, but also spatially localize pathological signs. We use average intensity projections within different ranges of Hounsfield-unit (HU) values, transforming input 3D CT scans into 2D RGB-like images. CNN architectures are trained for hierarchical tasks, leading to precise activation maps of pathological patterns. Results on a cohort of 352 subjects demonstrate high classification accuracy, localization precision and predictive power of 2 year survival. Such tool opens the way to CPA patient stratification and quantitative follow-up of CPA pathological signs, for patients under drug therapy.