Generating Synthetic Relational Tabular Data via Structural Causal Models
This work addresses a critical gap for data scientists and ML practitioners working with real-world relational tabular data, though it appears incremental as an extension of existing SCM-based approaches.
The authors tackled the problem of generating synthetic relational tabular data, which current methods inadequately address, by developing a novel framework based on structural causal models that creates datasets with complex inter-table dependencies. Their experiments confirmed the framework's ability to construct realistic relational datasets mimicking real-world scenarios.
Synthetic tabular data generation has received increasing attention in recent years, particularly with the emergence of foundation models for tabular data. The breakthrough success of TabPFN (Hollmann et al.,2025), which leverages vast quantities of synthetic tabular datasets derived from structural causal models (SCMs), demonstrates the critical role synthetic data plays in developing powerful tabular foundation models. However, most real-world tabular data exists in relational formats spanning multiple interconnected tables - a structure not adequately addressed by current generation methods. In this work, we extend the SCM-based approach by developing a novel framework that generates realistic synthetic relational tabular data including causal relationships across tables. Our experiments confirm that this framework is able to construct relational datasets with complex inter-table dependencies mimicking real-world scenarios.