LGOct 12, 2022

FCT-GAN: Enhancing Table Synthesis via Fourier Transform

arXiv:2210.06239v18 citationsh-index: 29
Originality Incremental advance
AI Analysis

This work addresses data privacy regulations like GDPR by enhancing tabular data synthesis for domains requiring secure data sharing, though it is incremental as it builds on existing GAN methods.

The paper tackles the problem of generating synthetic tabular data by addressing overlooked properties like global correlation and permutation invariance, proposing FCT-GAN which improves machine learning utility by up to 27.8% and statistical similarity by up to 26.5% compared to state-of-the-art baselines.

Synthetic tabular data emerges as an alternative for sharing knowledge while adhering to restrictive data access regulations, e.g., European General Data Protection Regulation (GDPR). Mainstream state-of-the-art tabular data synthesizers draw methodologies from Generative Adversarial Networks (GANs), which are composed of a generator and a discriminator. While convolution neural networks are shown to be a better architecture than fully connected networks for tabular data synthesizing, two key properties of tabular data are overlooked: (i) the global correlation across columns, and (ii) invariant synthesizing to column permutations of input data. To address the above problems, we propose a Fourier conditional tabular generative adversarial network (FCT-GAN). We introduce feature tokenization and Fourier networks to construct a transformer-style generator and discriminator, and capture both local and global dependencies across columns. The tokenizer captures local spatial features and transforms original data into tokens. Fourier networks transform tokens to frequency domains and element-wisely multiply a learnable filter. Extensive evaluation on benchmarks and real-world data shows that FCT-GAN can synthesize tabular data with high machine learning utility (up to 27.8% better than state-of-the-art baselines) and high statistical similarity to the original data (up to 26.5% better), while maintaining the global correlation across columns, especially on high dimensional dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes