LGMar 12, 2022
The Health Gym: Synthetic Health-Related Datasets for the Development of Reinforcement Learning AlgorithmsNicholas I-Hsien Kuo, Mark N. Polizzotto, Simon Finfer et al.
In recent years, the machine learning research community has benefited tremendously from the availability of openly accessible benchmark datasets. Clinical data are usually not openly available due to their highly confidential nature. This has hampered the development of reproducible and generalisable machine learning applications in health care. Here we introduce the Health Gym - a growing collection of highly realistic synthetic medical datasets that can be freely accessed to prototype, evaluate, and compare machine learning algorithms, with a specific focus on reinforcement learning. The three synthetic datasets described in this paper present patient cohorts with acute hypotension and sepsis in the intensive care unit, and people with human immunodeficiency virus (HIV) receiving antiretroviral therapy in ambulatory care. The datasets were created using a novel generative adversarial network (GAN). The distributions of variables, and correlations between variables and trends over time in the synthetic datasets mirror those in the real datasets. Furthermore, the risk of sensitive information disclosure associated with the public distribution of the synthetic datasets is estimated to be very low.
LGAug 18, 2022
Generating Synthetic Clinical Data that Capture Class Imbalanced Distributions with Generative Adversarial Networks: Example using Antiretroviral Therapy for HIVNicholas I-Hsien Kuo, Federico Garcia, Anders Sönnerborg et al.
Clinical data usually cannot be freely distributed due to their highly confidential nature and this hampers the development of machine learning in the healthcare domain. One way to mitigate this problem is by generating realistic synthetic datasets using generative adversarial networks (GANs). However, GANs are known to suffer from mode collapse thus creating outputs of low diversity. This lowers the quality of the synthetic healthcare data, and may cause it to omit patients of minority demographics or neglect less common clinical practices. In this paper, we extend the classic GAN setup with an additional variational autoencoder (VAE) and include an external memory to replay latent features observed from the real samples to the GAN generator. Using antiretroviral therapy for human immunodeficiency virus (ART for HIV) as a case study, we show that our extended setup overcomes mode collapse and generates a synthetic dataset that accurately describes severely imbalanced class distributions commonly found in real-world clinical variables. In addition, we demonstrate that our synthetic dataset is associated with a very low patient disclosure risk, and that it retains a high level of utility from the ground truth dataset to support the development of downstream machine learning algorithms.
LGMar 22, 2023
Synthetic Health-related Longitudinal Data with Mixed-type Variables Generated using Diffusion ModelsNicholas I-Hsien Kuo, Louisa Jorm, Sebastiano Barbieri
This paper presents a novel approach to simulating electronic health records (EHRs) using diffusion probabilistic models (DPMs). Specifically, we demonstrate the effectiveness of DPMs in synthesising longitudinal EHRs that capture mixed-type variables, including numeric, binary, and categorical variables. To our knowledge, this represents the first use of DPMs for this purpose. We compared our DPM-simulated datasets to previous state-of-the-art results based on generative adversarial networks (GANs) for two clinical applications: acute hypotension and human immunodeficiency virus (ART for HIV). Given the lack of similar previous studies in DPMs, a core component of our work involves exploring the advantages and caveats of employing DPMs across a wide range of aspects. In addition to assessing the realism of the synthetic datasets, we also trained reinforcement learning (RL) agents on the synthetic data to evaluate their utility for supporting the development of downstream machine learning models. Finally, we estimated that our DPM-simulated datasets are secure and posed a low patient exposure risk for public access.
LGFeb 28, 2025
Generating Clinically Realistic EHR Data via a Hierarchy- and Semantics-Guided TransformerGuanglin Zhou, Sebastiano Barbieri
Generating realistic synthetic electronic health records (EHRs) holds tremendous promise for accelerating healthcare research, facilitating AI model development and enhancing patient privacy. However, existing generative methods typically treat EHRs as flat sequences of discrete medical codes. This approach overlooks two critical aspects: the inherent hierarchical organization of clinical coding systems and the rich semantic context provided by code descriptions. Consequently, synthetic patient sequences often lack high clinical fidelity and have limited utility in downstream clinical tasks. In this paper, we propose the Hierarchy- and Semantics-Guided Transformer (HiSGT), a novel framework that leverages both hierarchical and semantic information for the generative process. HiSGT constructs a hierarchical graph to encode parent-child and sibling relationships among clinical codes and employs a graph neural network to derive hierarchy-aware embeddings. These are then fused with semantic embeddings extracted from a pre-trained clinical language model (e.g., ClinicalBERT), enabling the Transformer-based generator to more accurately model the nuanced clinical patterns inherent in real EHRs. Extensive experiments on the MIMIC-III and MIMIC-IV datasets demonstrate that HiSGT significantly improves the statistical alignment of synthetic data with real patient records, as well as supports robust downstream applications such as chronic disease classification. By addressing the limitations of conventional raw code-based generative models, HiSGT represents a significant step toward clinically high-fidelity synthetic data generation and a general paradigm suitable for interpretable medical code representation, offering valuable applications in data augmentation and privacy-preserving healthcare analytics.
MED-PHDec 30, 2024
Acquisition-Independent Deep Learning for Quantitative MRI Parameter Estimation using Neural Controlled Differential EquationsDaan Kuppens, Sebastiano Barbieri, Daisy van den Berg et al.
Deep learning has proven to be a suitable alternative to least-squares (LSQ) fitting for parameter estimation in various quantitative MRI (QMRI) models. However, current deep learning implementations are not robust to changes in MR acquisition protocols. In practice, QMRI acquisition protocols differ substantially between different studies and clinical settings. The lack of generalizability and adoptability of current deep learning approaches for QMRI parameter estimation impedes the implementation of these algorithms in clinical trials and clinical practice. Neural Controlled Differential Equations (NCDEs) allow for the sampling of incomplete and irregularly sampled data with variable length, making them ideal for use in QMRI parameter estimation. In this study, we show that NCDEs can function as a generic tool for the accurate prediction of QMRI parameters, regardless of QMRI sequence length, configuration of independent variables and QMRI forward model (variable flip angle T1-mapping, intravoxel incoherent motion MRI, dynamic contrast-enhanced MRI). NCDEs achieved lower mean squared error than LSQ fitting in low-SNR simulations and in vivo in challenging anatomical regions like the abdomen and leg, but this improvement was no longer evident at high SNR. NCDEs reduce estimation error interquartile range without increasing bias, particularly under conditions of high uncertainty. These findings suggest that NCDEs offer a robust approach for reliable QMRI parameter estimation, especially in scenarios with high uncertainty or low image quality. We believe that with NCDEs, we have solved one of the main challenges for using deep learning for QMRI parameter estimation in a broader clinical and research setting.
LGDec 7, 2021
Synthetic Acute Hypotension and Sepsis Datasets Based on MIMIC-III and Published as Part of the Health Gym ProjectNicholas I-Hsien Kuo, Mark Polizzotto, Simon Finfer et al.
These two synthetic datasets comprise vital signs, laboratory test results, administered fluid boluses and vasopressors for 3,910 patients with acute hypotension and for 2,164 patients with sepsis in the Intensive Care Unit (ICU). The patient cohorts were built using previously published inclusion and exclusion criteria and the data were created using Generative Adversarial Networks (GANs) and the MIMIC-III Clinical Database. The risk of identity disclosure associated with the release of these data was estimated to be very low (0.045%). The datasets were generated and published as part of the Health Gym, a project aiming to publicly distribute synthetic longitudinal health data for developing machine learning algorithms (with a particular focus on offline reinforcement learning) and for educational purposes.
LGAug 17, 2021
Incorporating Uncertainty in Learning to Defer Algorithms for Safe Computer-Aided DiagnosisJessie Liu, Blanca Gallego, Sebastiano Barbieri
Deep neural networks are increasingly being used for computer-aided diagnosis, but erroneous diagnoses can be extremely costly for patients. We propose a learning to defer with uncertainty (LDU) algorithm which identifies patients for whom diagnostic uncertainty is high and defers them for evaluation by human experts. LDU was evaluated on the diagnosis of myocardial infarction (using discharge summaries), the diagnosis of any comorbidities (using structured data), and the diagnosis of pleural effusion and pneumothorax (using chest x-rays), and compared with 'learning to defer without uncertainty information' (LD) and 'direct triage by uncertainty' (DT) methods. LDU achieved the same F1 score as LD but deferred considerably fewer patients (e.g. 36% vs. 69% deferral rate for diagnosing pleural effusion with an F1 score of 0.96). Furthermore, even when many patients were assigned the wrong diagnosis with high confidence (e.g. for the diagnosis of any comorbidities) LDU achieved a 17% increase in F1 score, whereas DT was not applicable. Importantly, the weight of the defer loss in LDU can be easily adjusted to obtain the desired trade-off between diagnostic accuracy and deferral rate. In conclusion, LDU can readily augment any existing diagnostic network to reduce the risk of erroneous diagnoses in clinical practice.
LGNov 28, 2020
Predicting cardiovascular risk from national administrative databases using a combined survival analysis and deep learning approachSebastiano Barbieri, Suneela Mehta, Billy Wu et al.
AIMS. This study compared the performance of deep learning extensions of survival analysis models with traditional Cox proportional hazards (CPH) models for deriving cardiovascular disease (CVD) risk prediction equations in national health administrative datasets. METHODS. Using individual person linkage of multiple administrative datasets, we constructed a cohort of all New Zealand residents aged 30-74 years who interacted with publicly funded health services during 2012, and identified hospitalisations and deaths from CVD over five years of follow-up. After excluding people with prior CVD or heart failure, sex-specific deep learning and CPH models were developed to estimate the risk of fatal or non-fatal CVD events within five years. The proportion of explained time-to-event occurrence, calibration, and discrimination were compared between models across the whole study population and in specific risk groups. FINDINGS. First CVD events occurred in 61,927 of 2,164,872 people. Among diagnoses and procedures, the largest 'local' hazard ratios were associated by the deep learning models with tobacco use in women (2.04, 95%CI: 1.99-2.10) and with chronic obstructive pulmonary disease with acute lower respiratory infection in men (1.56, 95%CI: 1.50-1.62). Other identified predictors (e.g. hypertension, chest pain, diabetes) aligned with current knowledge about CVD risk predictors. The deep learning models significantly outperformed the CPH models on the basis of proportion of explained time-to-event occurrence (Royston and Sauerbrei's R-squared: 0.468 vs. 0.425 in women and 0.383 vs. 0.348 in men), calibration, and discrimination (all p<0.0001). INTERPRETATION. Deep learning extensions of survival analysis models can be applied to large health administrative databases to derive interpretable CVD risk prediction equations that are more accurate than traditional CPH models.
MED-PHNov 3, 2020
Improved unsupervised physics-informed deep learning for intravoxel incoherent motion modeling and evaluation in pancreatic cancer patientsMisha P. T. Kaandorp, Sebastiano Barbieri, Remy Klaassen et al.
${\bf Purpose}$: Earlier work showed that IVIM-NET$_{orig}$, an unsupervised physics-informed deep neural network, was more accurate than other state-of-the-art intravoxel-incoherent motion (IVIM) fitting approaches to DWI. This study presents an improved version: IVIM-NET$_{optim}$, and characterizes its superior performance in pancreatic ductal adenocarcinoma (PDAC) patients. ${\bf Method}$: In simulations (SNR=20), the accuracy, independence and consistency of IVIM-NET were evaluated for combinations of hyperparameters (fit S0, constraints, network architecture, # hidden layers, dropout, batch normalization, learning rate), by calculating the NRMSE, Spearman's $ρ$, and the coefficient of variation (CV$_{NET}$), respectively. The best performing network, IVIM-NET$_{optim}$ was compared to least squares (LS) and a Bayesian approach at different SNRs. IVIM-NET$_{optim}$'s performance was evaluated in 23 PDAC patients. 14 of the patients received no treatment between scan sessions and 9 received chemoradiotherapy between sessions. Intersession within-subject standard deviations (wSD) and treatment-induced changes were assessed. ${\bf Results}$: In simulations, IVIM-NET$_{optim}$ outperformed IVIM-NET$_{orig}$ in accuracy (NRMSE(D)=0.18 vs 0.20; NMRSE(f)=0.22 vs 0.27; NMRSE(D*)=0.39 vs 0.39), independence ($ρ$(D*,f)=0.22 vs 0.74) and consistency (CV$_{NET}$ (D)=0.01 vs 0.10; CV$_{NET}$ (f)=0.02 vs 0.05; CV$_{NET}$ (D*)=0.04 vs 0.11). IVIM-NET$_{optim}$ showed superior performance to the LS and Bayesian approaches at SNRs<50. In vivo, IVIM-NET$_{optim}$ sshowed significantly less noisy parameter maps with lower wSD for D and f than the alternatives. In the treated cohort, IVIM-NET$_{optim}$ detected the most individual patients with significant parameter changes compared to day-to-day variations. ${\bf Conclusion}$: IVIM-NET$_{optim}$ is recommended for IVIM fitting to DWI data.
LGMay 21, 2019
Benchmarking Deep Learning Architectures for Predicting Readmission to the ICU and Describing Patients-at-RiskSebastiano Barbieri, James Kemp, Oscar Perez-Concha et al.
Objective: To compare different deep learning architectures for predicting the risk of readmission within 30 days of discharge from the intensive care unit (ICU). The interpretability of attention-based models is leveraged to describe patients-at-risk. Methods: Several deep learning architectures making use of attention mechanisms, recurrent layers, neural ordinary differential equations (ODEs), and medical concept embeddings with time-aware attention were trained using publicly available electronic medical record data (MIMIC-III) associated with 45,298 ICU stays for 33,150 patients. Bayesian inference was used to compute the posterior over weights of an attention-based model. Odds ratios associated with an increased risk of readmission were computed for static variables. Diagnoses, procedures, medications, and vital signs were ranked according to the associated risk of readmission. Results: A recurrent neural network, with time dynamics of code embeddings computed by neural ODEs, achieved the highest average precision of 0.331 (AUROC: 0.739, F1-Score: 0.372). Predictive accuracy was comparable across neural network architectures. Groups of patients at risk included those suffering from infectious complications, with chronic or progressive conditions, and for whom standard medical care was not suitable. Conclusions: Attention-based networks may be preferable to recurrent networks if an interpretable model is required, at only marginal cost in predictive accuracy.
QMFeb 28, 2019
Deep Learning How to Fit an Intravoxel Incoherent Motion Model to Diffusion-Weighted MRISebastiano Barbieri, Oliver J. Gurney-Champion, Remy Klaassen et al.
Purpose: This prospective clinical study assesses the feasibility of training a deep neural network (DNN) for intravoxel incoherent motion (IVIM) model fitting to diffusion-weighted magnetic resonance imaging (DW-MRI) data and evaluates its performance. Methods: In May 2011, ten male volunteers (age range: 29 to 53 years, mean: 37 years) underwent DW-MRI of the upper abdomen on 1.5T and 3.0T magnetic resonance scanners. Regions of interest in the left and right liver lobe, pancreas, spleen, renal cortex, and renal medulla were delineated independently by two readers. DNNs were trained for IVIM model fitting using these data; results were compared to least-squares and Bayesian approaches to IVIM fitting. Intraclass Correlation Coefficients (ICC) were used to assess consistency of measurements between readers. Intersubject variability was evaluated using Coefficients of Variation (CV). The fitting error was calculated based on simulated data and the average fitting time of each method was recorded. Results: DNNs were trained successfully for IVIM parameter estimation. This approach was associated with high consistency between the two readers (ICCs between 50 and 97%), low intersubject variability of estimated parameter values (CVs between 9.2 and 28.4), and the lowest error when compared with least-squares and Bayesian approaches. Fitting by DNNs was several orders of magnitude quicker than the other methods but the networks may need to be re-trained for different acquisition protocols or imaged anatomical regions. Conclusion: DNNs are recommended for accurate and robust IVIM model fitting to DW-MRI data. Suitable software is available at (1).
CVOct 23, 2013
A Ray-based Approach for Boundary Estimation of Fiber Bundles Derived from Diffusion Tensor ImagingMiriam H. A. Bauer, Sebastiano Barbieri, Jan Klein et al.
Diffusion Tensor Imaging (DTI) is a non-invasive imaging technique that allows estimation of the location of white matter tracts in-vivo, based on the measurement of water diffusion properties. For each voxel, a second-order tensor can be calculated by using diffusion-weighted sequences (DWI) that are sensitive to the random motion of water molecules. Given at least 6 diffusion-weighted images with different gradients and one unweighted image, the coefficients of the symmetric diffusion tensor matrix can be calculated. Deriving the eigensystem of the tensor, the eigenvectors and eigenvalues can be calculated to describe the three main directions of diffusion and its magnitude. Using DTI data, fiber bundles can be determined, to gain information about eloquent brain structures. Especially in neurosurgery, information about location and dimension of eloquent structures like the corticospinal tract or the visual pathways is of major interest. Therefore, the fiber bundle boundary has to be determined. In this paper, a novel ray-based approach for boundary estimation of tubular structures is presented.