LGCVMar 10, 2023

EHRDiff: Exploring Realistic EHR Synthesis with Diffusion Models

arXiv:2303.05656v332 citationsh-index: 18
Originality Incremental advance
AI Analysis

This work addresses privacy barriers for researchers in precision medicine by improving synthetic EHR data generation, though it is incremental as it applies an existing generative modeling technique to a new domain.

The paper tackles the problem of limited access to high-quality electronic health records (EHR) due to privacy concerns by proposing EHRDiff, a diffusion model for EHR synthesis, which achieves new state-of-the-art quality in generating synthetic EHR data.

Electronic health records (EHR) contain a wealth of biomedical information, serving as valuable resources for the development of precision medicine systems. However, privacy concerns have resulted in limited access to high-quality and large-scale EHR data for researchers, impeding progress in methodological development. Recent research has delved into synthesizing realistic EHR data through generative modeling techniques, where a majority of proposed methods relied on generative adversarial networks (GAN) and their variants for EHR synthesis. Despite GAN-based methods attaining state-of-the-art performance in generating EHR data, these approaches are difficult to train and prone to mode collapse. Recently introduced in generative modeling, diffusion models have established cutting-edge performance in image generation, but their efficacy in EHR data synthesis remains largely unexplored. In this study, we investigate the potential of diffusion models for EHR data synthesis and introduce a novel method, EHRDiff. Through extensive experiments, EHRDiff establishes new state-of-the-art quality for synthetic EHR data, protecting private information in the meanwhile.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes