CV AIOct 26, 2024

Diff-CXR: Report-to-CXR generation through a disease-knowledge enhanced diffusion model

Peng Huang, Bowen Guo, Shuyu Liang, Junhu Fu, Yuanyuan Wang, Yi Guo

arXiv:2410.20165v13.71 citationsh-index: 25

Originality Incremental advance

AI Analysis

This work addresses the challenge of controlled and diverse image generation in medical imaging, with potential applications in clinical settings, though it is incremental in improving existing text-to-image methods.

The paper tackles the problem of generating chest X-ray images from medical reports by proposing Diff-CXR, a disease-knowledge enhanced diffusion model, which outperforms previous state-of-the-art methods by up to 56.4% in mAUC score on benchmarks.

Text-To-Image (TTI) generation is significant for controlled and diverse image generation with broad potential applications. Although current medical TTI methods have made some progress in report-to-Chest-Xray (CXR) generation, their generation performance may be limited due to the intrinsic characteristics of medical data. In this paper, we propose a novel disease-knowledge enhanced Diffusion-based TTI learning framework, named Diff-CXR, for medical report-to-CXR generation. First, to minimize the negative impacts of noisy data on generation, we devise a Latent Noise Filtering Strategy that gradually learns the general patterns of anomalies and removes them in the latent space. Then, an Adaptive Vision-Aware Textual Learning Strategy is designed to learn concise and important report embeddings in a domain-specific Vision-Language Model, providing textual guidance for Chest-Xray generation. Finally, by incorporating the general disease knowledge into the pretrained TTI model via a delicate control adapter, a disease-knowledge enhanced diffusion model is introduced to achieve realistic and precise report-to-CXR generation. Experimentally, our Diff-CXR outperforms previous SOTA medical TTI methods by 33.4\% / 8.0\% and 23.8\% / 56.4\% in the FID and mAUC score on MIMIC-CXR and IU-Xray, with the lowest computational complexity at 29.641 GFLOPs. Downstream experiments on three thorax disease classification benchmarks and one CXR-report generation benchmark demonstrate that Diff-CXR is effective in improving classical CXR analysis methods. Notably, models trained on the combination of 1\% real data and synthetic data can achieve a competitive mAUC score compared to models trained on all data, presenting promising clinical applications.

View on arXiv PDF

Similar