CLAILGApr 20, 2025

A Case Study Exploring the Current Landscape of Synthetic Medical Record Generation with Commercial LLMs

arXiv:2504.14657v212 citationsh-index: 1CHIL
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of creating privacy-preserving synthetic healthcare data that generalizes across hospitals, though it is incremental in assessing current LLM limitations.

The study evaluated commercial LLMs for generating synthetic electronic health records, finding that while they work well for small feature sets, they struggle to maintain realistic distributions and correlations as data dimensionality increases, limiting cross-hospital generalization.

Synthetic Electronic Health Records (EHRs) offer a valuable opportunity to create privacy preserving and harmonized structured data, supporting numerous applications in healthcare. Key benefits of synthetic data include precise control over the data schema, improved fairness and representation of patient populations, and the ability to share datasets without concerns about compromising real individuals privacy. Consequently, the AI community has increasingly turned to Large Language Models (LLMs) to generate synthetic data across various domains. However, a significant challenge in healthcare is ensuring that synthetic health records reliably generalize across different hospitals, a long standing issue in the field. In this work, we evaluate the current state of commercial LLMs for generating synthetic data and investigate multiple aspects of the generation process to identify areas where these models excel and where they fall short. Our main finding from this work is that while LLMs can reliably generate synthetic health records for smaller subsets of features, they struggle to preserve realistic distributions and correlations as the dimensionality of the data increases, ultimately limiting their ability to generalize across diverse hospital settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes