LG MLFeb 25

Bayesian Generative Adversarial Networks via Gaussian Approximation for Tabular Data Synthesis

Bahrul Ilmi Nasution, Mark Elliot, Richard Allmendinger

arXiv:2602.21948v12.71 citationsh-index: 26

Originality Incremental advance

AI Analysis

This work addresses the problem of efficient and effective synthetic data generation for tabular data, offering a simpler Bayesian approach that is incremental over existing methods.

The authors tackled the computational inefficiency and risk-utility trade-off issues in Bayesian GANs for tabular data synthesis by integrating Gaussian approximation via SWAG into CTGAN, resulting in better preservation of tabular structure and inferential statistics with reduced privacy risk compared to CTGAN.

Generative Adversarial Networks (GAN) have been used in many studies to synthesise mixed tabular data. Conditional tabular GAN (CTGAN) have been the most popular variant but struggle to effectively navigate the risk-utility trade-off. Bayesian GAN have received less attention for tabular data, but have been explored with unstructured data such as images and text. The most used technique employed in Bayesian GAN is Markov Chain Monte Carlo (MCMC), but it is computationally intensive, particularly in terms of weight storage. In this paper, we introduce Gaussian Approximation of CTGAN (GACTGAN), an integration of the Bayesian posterior approximation technique using Stochastic Weight Averaging-Gaussian (SWAG) within the CTGAN generator to synthesise tabular data, reducing computational overhead after the training phase. We demonstrate that GACTGAN yields better synthetic data compared to CTGAN, achieving better preservation of tabular structure and inferential statistics with less privacy risk. These results highlight GACTGAN as a simpler, effective implementation of Bayesian tabular synthesis.

View on arXiv PDF

Similar