LGAISep 26, 2025

SurvDiff: A Diffusion Model for Generating Synthetic Data in Survival Analysis

arXiv:2509.22352v11 citationsh-index: 11
Originality Incremental advance
AI Analysis

This addresses the need for realistic synthetic data in clinical research to reproduce event-time distributions and censoring mechanisms, representing an incremental advance by applying diffusion models to a specific domain.

The paper tackled the problem of generating synthetic data for survival analysis, which involves modeling time-to-event outcomes with incomplete event information, and proposed SurvDiff, a diffusion model that outperformed state-of-the-art baselines in distributional fidelity and downstream metrics across multiple medical datasets.

Survival analysis is a cornerstone of clinical research by modeling time-to-event outcomes such as metastasis, disease relapse, or patient death. Unlike standard tabular data, survival data often come with incomplete event information due to dropout, or loss to follow-up. This poses unique challenges for synthetic data generation, where it is crucial for clinical research to faithfully reproduce both the event-time distribution and the censoring mechanism. In this paper, we propose SurvDiff, an end-to-end diffusion model specifically designed for generating synthetic data in survival analysis. SurvDiff is tailored to capture the data-generating mechanism by jointly generating mixed-type covariates, event times, and right-censoring, guided by a survival-tailored loss function. The loss encodes the time-to-event structure and directly optimizes for downstream survival tasks, which ensures that SurvDiff (i) reproduces realistic event-time distributions and (ii) preserves the censoring mechanism. Across multiple datasets, we show that \survdiff consistently outperforms state-of-the-art generative baselines in both distributional fidelity and downstream evaluation metrics across multiple medical datasets. To the best of our knowledge, SurvDiff is the first diffusion model explicitly designed for generating synthetic survival data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes