Invertible Tabular GANs: Killing Two Birds with OneStone for Tabular Data Synthesis
This work addresses data scarcity and privacy concerns in tabular data synthesis, offering a method that balances quality and security, though it is incremental as it builds on existing GAN and invertible network techniques.
The paper tackles tabular data synthesis by proposing a GAN framework that integrates adversarial training with invertible neural network regularization, achieving improved synthesis quality with better F1 scores and enhanced privacy by increasing distance between real and fake records.
Tabular data synthesis has received wide attention in the literature. This is because available data is often limited, incomplete, or cannot be obtained easily, and data privacy is becoming increasingly important. In this work, we present a generalized GAN framework for tabular synthesis, which combines the adversarial training of GANs and the negative log-density regularization of invertible neural networks. The proposed framework can be used for two distinctive objectives. First, we can further improve the synthesis quality, by decreasing the negative log-density of real records in the process of adversarial training. On the other hand, by increasing the negative log-density of real records, realistic fake records can be synthesized in a way that they are not too much close to real records and reduce the chance of potential information leakage. We conduct experiments with real-world datasets for classification, regression, and privacy attacks. In general, the proposed method demonstrates the best synthesis quality (in terms of task-oriented evaluation metrics, e.g., F1) when decreasing the negative log-density during the adversarial training. If increasing the negative log-density, our experimental results show that the distance between real and fake records increases, enhancing robustness against privacy attacks.