LGAINov 30, 2022

Generating Realistic Synthetic Relational Data through Graph Variational Autoencoders

arXiv:2211.16889v111 citationsh-index: 26
Originality Incremental advance
AI Analysis

This addresses the need for reliable synthetic data in industries like healthcare and finance, but it is incremental as it adapts existing frameworks to a new domain.

The paper tackles the problem of generating realistic synthetic relational databases, which is challenging due to the non-trivial application of image-based methods to tabular and relational data. The results show that the method accurately preserves the structures of real databases, even for large datasets with advanced data types.

Synthetic data generation has recently gained widespread attention as a more reliable alternative to traditional data anonymization. The involved methods are originally developed for image synthesis. Hence, their application to the typically tabular and relational datasets from healthcare, finance and other industries is non-trivial. While substantial research has been devoted to the generation of realistic tabular datasets, the study of synthetic relational databases is still in its infancy. In this paper, we combine the variational autoencoder framework with graph neural networks to generate realistic synthetic relational databases. We then apply the obtained method to two publicly available databases in computational experiments. The results indicate that real databases' structures are accurately preserved in the resulting synthetic datasets, even for large datasets with advanced data types.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes