A Model-Based Approach to Synthetic Data Set Generation for Patient-Ventilator Waveforms for Machine Learning and Educational Use
This work addresses the need for large annotated datasets in medical AI for ICU applications, though it is incremental as it extends existing models.
The authors tackled the problem of detecting patient-ventilator asynchronies in mechanical ventilation by generating a synthetic dataset using a model-based approach, which was verified to produce waveforms with key features of experimental data.
Although mechanical ventilation is a lifesaving intervention in the ICU, it has harmful side-effects, such as barotrauma and volutrauma. These harms can occur due to asynchronies. Asynchronies are defined as a mismatch between the ventilator timing and patient respiratory effort. Automatic detection of these asynchronies, and subsequent feedback, would improve lung ventilation and reduce the probability of lung damage. Neural networks to detect asynchronies provide a promising new approach but require large annotated data sets, which are difficult to obtain and require complex monitoring of inspiratory effort. In this work, we propose a model-based approach to generate a synthetic data set for machine learning and educational use by extending an existing lung model with a first-order ventilator model. The physiological nature of the derived lung model allows adaptation to various disease archetypes, resulting in a diverse data set. We generated a synthetic data set using 9 different patient archetypes, which are derived from measurements in the literature. The model and synthetic data quality have been verified by comparison with clinical data, review by a clinical expert, and an artificial intelligence model that was trained on experimental data. The evaluation showed it was possible to generate patient-ventilator waveforms including asynchronies that have the most important features of experimental patient-ventilator waveforms.