LG DBNov 11, 2024

Hierarchical Conditional Tabular GAN for Multi-Tabular Synthetic Data Generation

arXiv:2411.07009v13 citationsh-index: 2

Originality Incremental advance

AI Analysis

This addresses the need for synthetic data generation in scenarios with limited real data access or privacy concerns, focusing on multi-tabular datasets, but it is incremental as it builds on existing single-tabular methods.

The paper tackled the problem of generating synthetic data for multi-tabular datasets with complex relationships, proposing the HCTGAN algorithm, which efficiently samples large amounts of synthetic data while maintaining adequate quality and referential integrity compared to the HMA1 model.

The generation of synthetic data is a state-of-the-art approach to leverage when access to real data is limited or privacy regulations limit the usability of sensitive data. A fair amount of research has been conducted on synthetic data generation for single-tabular datasets, but only a limited amount of research has been conducted on multi-tabular datasets with complex table relationships. In this paper we propose the algorithm HCTGAN to synthesize multi-tabular data from complex multi-tabular datasets. We compare our results to the probabilistic model HMA1. Our findings show that our proposed algorithm can more efficiently sample large amounts of synthetic data for deep and complex multi-tabular datasets, whilst achieving adequate data quality and always guaranteeing referential integrity. We conclude that the HCTGAN algorithm is suitable for generating large amounts of synthetic data efficiently for deep multi-tabular datasets with complex relationships. We additionally suggest that the HMA1 model should be used on smaller datasets when emphasis is on data quality.

View on arXiv PDF

Similar