Synthesizing Tabular Data Using Selectivity Enhanced Generative Adversarial Networks
It solves the problem of inefficient data synthesis for E-commerce platforms during high-traffic events, though it is incremental as it builds on existing GAN methods.
This paper tackles the problem of generating tabular data for E-commerce stress testing by addressing computational demands overlooked by existing GAN methods, introducing a GAN-based approach with query selectivity constraints that improves selectivity estimation accuracy by up to 20% and machine learning utility by up to 6% on five real-world datasets.
As E-commerce platforms face surging transactions during major shopping events like Black Friday, stress testing with synthesized data is crucial for resource planning. Most recent studies use Generative Adversarial Networks (GANs) to generate tabular data while ensuring privacy and machine learning utility. However, these methods overlook the computational demands of processing GAN-generated data, making them unsuitable for E-commerce stress testing. This thesis introduces a novel GAN-based approach incorporating query selectivity constraints, a key factor in database transaction processing. We integrate a pre-trained deep neural network to maintain selectivity consistency between real and synthetic data. Our method, tested on five real-world datasets, outperforms three state-of-the-art GANs and a VAE model, improving selectivity estimation accuracy by up to 20pct and machine learning utility by up to 6 pct.