LGJul 25, 2025

Dependency-aware synthetic tabular data generation

arXiv:2507.19211v11 citationsh-index: 11
Originality Incremental advance
AI Analysis

This addresses the need for high-fidelity synthetic data in privacy-sensitive domains like health care, though it is incremental as it builds on existing generative models.

The paper tackled the problem of preserving inter-attribute relationships like functional and logical dependencies in synthetic tabular data, proposing the Hierarchical Feature Generation Framework (HFGF) which improved dependency preservation across six generative models on four benchmark datasets.

Synthetic tabular data is increasingly used in privacy-sensitive domains such as health care, but existing generative models often fail to preserve inter-attribute relationships. In particular, functional dependencies (FDs) and logical dependencies (LDs), which capture deterministic and rule-based associations between features, are rarely or often poorly retained in synthetic datasets. To address this research gap, we propose the Hierarchical Feature Generation Framework (HFGF) for synthetic tabular data generation. We created benchmark datasets with known dependencies to evaluate our proposed HFGF. The framework first generates independent features using any standard generative model, and then reconstructs dependent features based on predefined FD and LD rules. Our experiments on four benchmark datasets with varying sizes, feature imbalance, and dependency complexity demonstrate that HFGF improves the preservation of FDs and LDs across six generative models, including CTGAN, TVAE, and GReaT. Our findings demonstrate that HFGF can significantly enhance the structural fidelity and downstream utility of synthetic tabular data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes