LGOct 22, 2024

Masked Clinical Modelling: A Framework for Synthetic and Augmented Survival Data Generation

arXiv:2410.16811v22 citationsh-index: 4Studies in Health Technology and Informatics
Originality Incremental advance
AI Analysis

This addresses privacy barriers in healthcare research by enabling secure data sharing and model development, though it is incremental as it builds on masked language modelling for a specific domain.

The paper tackled the problem of restricted access to clinical data by proposing Masked Clinical Modelling (MCM), a framework for generating synthetic and augmented survival data that improves discrimination and calibration in survival analysis compared to existing methods.

Access to real clinical data is often restricted due to privacy obligations, creating significant barriers for healthcare research. Synthetic datasets provide a promising solution, enabling secure data sharing and model development. However, most existing approaches focus on data realism rather than utility -- ensuring that models trained on synthetic data yield clinically meaningful insights comparable to those trained on real data. In this paper, we present Masked Clinical Modelling (MCM), a framework inspired by masked language modelling, designed for both data synthesis and conditional data augmentation. We evaluate this prototype on the WHAS500 dataset using Cox Proportional Hazards models, focusing on the preservation of hazard ratios as key clinical metrics. Our results show that data generated using the MCM framework improves both discrimination and calibration in survival analysis, outperforming existing methods. MCM demonstrates strong potential to support survival data analysis and broader healthcare applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes