LGNov 2, 2025

MedEqualizer: A Framework Investigating Bias in Synthetic Medical Data and Mitigation via Augmentation

arXiv:2511.01054v1h-index: 3
Originality Incremental advance
AI Analysis

This tackles fairness issues in synthetic healthcare data for clinical research, though it is incremental as it builds on existing GAN-based methods with a mitigation approach.

The study investigated bias in synthetic medical data generated by GAN models using the MIMIC-III dataset, finding significant imbalances in demographic subgroup representation measured by logarithmic disparity. To address this, they introduced MedEqualizer, an augmentation framework that enriches underrepresented subgroups, which significantly improved demographic balance in synthetic datasets.

Synthetic healthcare data generation presents a viable approach to enhance data accessibility and support research by overcoming limitations associated with real-world medical datasets. However, ensuring fairness across protected attributes in synthetic data is critical to avoid biased or misleading results in clinical research and decision-making. In this study, we assess the fairness of synthetic data generated by multiple generative adversarial network (GAN)-based models using the MIMIC-III dataset, with a focus on representativeness across protected demographic attributes. We measure subgroup representation using the logarithmic disparity metric and observe significant imbalances, with many subgroups either underrepresented or overrepresented in the synthetic data, compared to the real data. To mitigate these disparities, we introduce MedEqualizer, a model-agnostic augmentation framework that enriches the underrepresented subgroups prior to synthetic data generation. Our results show that MedEqualizer significantly improves demographic balance in the resulting synthetic datasets, offering a viable path towards more equitable and representative healthcare data synthesis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes