LGCLNov 19, 2024

A Review on Generative AI Models for Synthetic Medical Text, Time Series, and Longitudinal Data

arXiv:2411.12274v127 citationsh-index: 16npj Digital Medicine
Originality Synthesis-oriented
AI Analysis

It addresses the need for synthetic data in digital medicine to tackle issues like privacy and data scarcity, but it is incremental as a review paper.

This paper conducted a scoping review of 52 publications on generative AI models for synthetic health records, identifying privacy preservation as the main objective and highlighting adversarial networks, probabilistic models, and large language models as superior for different data types.

This paper presents the results of a novel scoping review on the practical models for generating three different types of synthetic health records (SHRs): medical text, time series, and longitudinal data. The innovative aspects of the review, which incorporate study objectives, data modality, and research methodology of the reviewed studies, uncover the importance and the scope of the topic for the digital medicine context. In total, 52 publications met the eligibility criteria for generating medical time series (22), longitudinal data (17), and medical text (13). Privacy preservation was found to be the main research objective of the studied papers, along with class imbalance, data scarcity, and data imputation as the other objectives. The adversarial network-based, probabilistic, and large language models exhibited superiority for generating synthetic longitudinal data, time series, and medical texts, respectively. Finding a reliable performance measure to quantify SHR re-identification risk is the major research gap of the topic.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes