CVAIAug 22, 2025

Improving Performance, Robustness, and Fairness of Radiographic AI Models with Finely-Controllable Synthetic Data

Stanford
arXiv:2508.16783v16 citationsh-index: 21Has CodeRes Sq
Originality Highly original
AI Analysis

This work addresses the problem of limited dataset diversity and fairness in medical AI for clinicians and patients, offering a novel method to improve model generalization and equity, though it builds incrementally on synthetic data generation approaches.

The paper tackled the challenge of achieving robust performance and fairness in deep learning models for diagnostic imaging by introducing RoentGen-v2, a text-to-image diffusion model for chest radiographs that generates finely-controllable synthetic data, resulting in a 6.5% accuracy increase and a 19.3% reduction in the underdiagnosis fairness gap for downstream classification models.

Achieving robust performance and fairness across diverse patient populations remains a challenge in developing clinically deployable deep learning models for diagnostic imaging. Synthetic data generation has emerged as a promising strategy to address limitations in dataset scale and diversity. We introduce RoentGen-v2, a text-to-image diffusion model for chest radiographs that enables fine-grained control over both radiographic findings and patient demographic attributes, including sex, age, and race/ethnicity. RoentGen-v2 is the first model to generate clinically plausible images with demographic conditioning, facilitating the creation of a large, demographically balanced synthetic dataset comprising over 565,000 images. We use this large synthetic dataset to evaluate optimal training pipelines for downstream disease classification models. In contrast to prior work that combines real and synthetic data naively, we propose an improved training strategy that leverages synthetic data for supervised pretraining, followed by fine-tuning on real data. Through extensive evaluation on over 137,000 chest radiographs from five institutions, we demonstrate that synthetic pretraining consistently improves model performance, generalization to out-of-distribution settings, and fairness across demographic subgroups. Across datasets, synthetic pretraining led to a 6.5% accuracy increase in the performance of downstream classification models, compared to a modest 2.7% increase when naively combining real and synthetic data. We observe this performance improvement simultaneously with the reduction of the underdiagnosis fairness gap by 19.3%. These results highlight the potential of synthetic imaging to advance equitable and generalizable medical deep learning under real-world data constraints. We open source our code, trained models, and synthetic dataset at https://github.com/StanfordMIMI/RoentGen-v2 .

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes