LGAINov 20, 2024

SynEHRgy: Synthesizing Mixed-Type Structured Electronic Health Records using Decoder-Only Transformers

arXiv:2411.13428v17 citationsh-index: 3
Originality Incremental advance
AI Analysis

This addresses data augmentation and privacy issues in healthcare for researchers and practitioners, but appears incremental as it builds on existing transformer methods.

The paper tackled the problem of generating synthetic Electronic Health Records (EHRs) by proposing a novel tokenization strategy for mixed-type structured data and using a GPT-like transformer model, achieving high-quality results as benchmarked on the MIMIC-III dataset.

Generating synthetic Electronic Health Records (EHRs) offers significant potential for data augmentation, privacy-preserving data sharing, and improving machine learning model training. We propose a novel tokenization strategy tailored for structured EHR data, which encompasses diverse data types such as covariates, ICD codes, and irregularly sampled time series. Using a GPT-like decoder-only transformer model, we demonstrate the generation of high-quality synthetic EHRs. Our approach is evaluated using the MIMIC-III dataset, and we benchmark the fidelity, utility, and privacy of the generated data against state-of-the-art models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes