LGJul 15, 2025

Synthetic Tabular Data Generation: A Comparative Survey for Modern Techniques

arXiv:2507.11590v15 citationsh-index: 4
Originality Synthesis-oriented
AI Analysis

It addresses data scarcity and privacy issues in domains like finance and healthcare, offering an incremental review with new organizational insights.

This survey tackles the challenge of generating synthetic tabular data under privacy constraints by reviewing modern techniques that preserve feature relationships and statistical fidelity, proposing a novel taxonomy and benchmark framework to guide research and implementation.

As privacy regulations become more stringent and access to real-world data becomes increasingly constrained, synthetic data generation has emerged as a vital solution, especially for tabular datasets, which are central to domains like finance, healthcare and the social sciences. This survey presents a comprehensive and focused review of recent advances in synthetic tabular data generation, emphasizing methods that preserve complex feature relationships, maintain statistical fidelity, and satisfy privacy requirements. A key contribution of this work is the introduction of a novel taxonomy based on practical generation objectives, including intended downstream applications, privacy guarantees, and data utility, directly informing methodological design and evaluation strategies. Therefore, this review prioritizes the actionable goals that drive synthetic data creation, including conditional generation and risk-sensitive modeling. Additionally, the survey proposes a benchmark framework to align technical innovation with real-world demands. By bridging theoretical foundations with practical deployment, this work serves as both a roadmap for future research and a guide for implementing synthetic tabular data in privacy-critical environments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes