LGAICYFeb 6, 2024

CEHR-GPT: Generating Electronic Health Records with Chronological Patient Timelines

arXiv:2402.04400v234 citationsh-index: 80
AI Analysis

This addresses the need for realistic synthetic EHR data for healthcare researchers without direct data access, though it appears incremental as it builds on existing GPT and CEHR-BERT approaches.

The paper tackles the problem of generating synthetic Electronic Health Records (EHR) that preserve temporal dependencies in patient histories, by training a GPT model using a patient representation from CEHR-BERT to generate sequences convertible to OMOP format.

Synthetic Electronic Health Records (EHR) have emerged as a pivotal tool in advancing healthcare applications and machine learning models, particularly for researchers without direct access to healthcare data. Although existing methods, like rule-based approaches and generative adversarial networks (GANs), generate synthetic data that resembles real-world EHR data, these methods often use a tabular format, disregarding temporal dependencies in patient histories and limiting data replication. Recently, there has been a growing interest in leveraging Generative Pre-trained Transformers (GPT) for EHR data. This enables applications like disease progression analysis, population estimation, counterfactual reasoning, and synthetic data generation. In this work, we focus on synthetic data generation and demonstrate the capability of training a GPT model using a particular patient representation derived from CEHR-BERT, enabling us to generate patient sequences that can be seamlessly converted to the Observational Medical Outcomes Partnership (OMOP) data format.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes