Synthetic medical data generation: state of the art and application to trauma mechanism classification
It tackles data privacy issues for medical researchers, but appears incremental as it builds on existing state-of-the-art methods.
This paper addresses the challenge of patient confidentiality and reproducibility in medical machine learning by generating synthetic medical data, specifically applying it to trauma mechanism classification with a proposed methodology for combining tabular and text data.
Faced with the challenges of patient confidentiality and scientific reproducibility, research on machine learning for health is turning towards the conception of synthetic medical databases. This article presents a brief overview of state-of-the-art machine learning methods for generating synthetic tabular and textual data, focusing their application to the automatic classification of trauma mechanisms, followed by our proposed methodology for generating high-quality, synthetic medical records combining tabular and unstructured text data.