LGAIJan 14, 2022

Synthesising Electronic Health Records: Cystic Fibrosis Patient Group

arXiv:2201.05400v11 citations
Originality Synthesis-oriented
AI Analysis

This work addresses bias and generalizability challenges in healthcare machine learning by enhancing interoperability and privacy for small patient groups, but it is incremental as it applies existing deep generative methods to a new domain.

The paper tackled the problem of class imbalance degrading predictive performance in supervised learning by evaluating synthetic data generators for electronic health records, specifically for cystic fibrosis patients, and found that augmenting imbalanced datasets with synthetic data increased predictive performance in patient outcome classification.

Class imbalance can often degrade predictive performance of supervised learning algorithms. Balanced classes can be obtained by oversampling exact copies, with noise, or interpolation between nearest neighbours (as in traditional SMOTE methods). Oversampling tabular data using augmentation, as is typical in computer vision tasks, can be achieved with deep generative models. Deep generative models are effective data synthesisers due to their ability to capture complex underlying distributions. Synthetic data in healthcare can enhance interoperability between healthcare providers by ensuring patient privacy. Equipped with large synthetic datasets which do well to represent small patient groups, machine learning in healthcare can address the current challenges of bias and generalisability. This paper evaluates synthetic data generators ability to synthesise patient electronic health records. We test the utility of synthetic data for patient outcome classification, observing increased predictive performance when augmenting imbalanced datasets with synthetic data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes