Joint Relational Database Generation via Graph-Conditional Diffusion Models
This addresses the need for flexible and accurate relational database generation for applications like privacy-preserving data release, though it is an incremental improvement over existing methods.
The paper tackled the problem of generating relational databases by proposing a joint modeling approach without fixed table orders, which outperformed autoregressive baselines in capturing multi-hop inter-table correlations and achieved state-of-the-art performance on single-table fidelity metrics.
Building generative models for relational databases (RDBs) is important for applications like privacy-preserving data release and augmenting real datasets. However, most prior work either focuses on single-table generation or relies on autoregressive factorizations that impose a fixed table order and generate tables sequentially. This approach limits parallelism, restricts flexibility in downstream applications like missing value imputation, and compounds errors due to commonly made conditional independence assumptions. We propose a fundamentally different approach: jointly modeling all tables in an RDB without imposing any order. By using a natural graph representation of RDBs, we propose the Graph-Conditional Relational Diffusion Model (GRDM). GRDM leverages a graph neural network to jointly denoise row attributes and capture complex inter-table dependencies. Extensive experiments on six real-world RDBs demonstrate that our approach substantially outperforms autoregressive baselines in modeling multi-hop inter-table correlations and achieves state-of-the-art performance on single-table fidelity metrics.