LGMar 7, 2022

DATGAN: Integrating expert knowledge into deep learning for synthetic tabular data

arXiv:2203.03489v123 citationsh-index: 69
AI Analysis

This addresses the challenge of generating reliable synthetic tabular data for applications like bias correction or simulation, though it is incremental as it builds on existing GAN methods.

The paper tackles the problem of controlling the generation process in GANs for synthetic tabular data, which often leads to issues like lack of representativity and bias, by introducing DATGAN that integrates expert knowledge via a DAG; the result is that DATGAN outperforms state-of-the-art generative models on multiple case studies.

Synthetic data can be used in various applications, such as correcting bias datasets or replacing scarce original data for simulation purposes. Generative Adversarial Networks (GANs) are considered state-of-the-art for developing generative models. However, these deep learning models are data-driven, and it is, thus, difficult to control the generation process. It can, therefore, lead to the following issues: lack of representativity in the generated data, the introduction of bias, and the possibility of overfitting the sample's noise. This article presents the Directed Acyclic Tabular GAN (DATGAN) to address these limitations by integrating expert knowledge in deep learning models for synthetic tabular data generation. This approach allows the interactions between variables to be specified explicitly using a Directed Acyclic Graph (DAG). The DAG is then converted to a network of modified Long Short-Term Memory (LSTM) cells to accept multiple inputs. Multiple DATGAN versions are systematically tested on multiple assessment metrics. We show that the best versions of the DATGAN outperform state-of-the-art generative models on multiple case studies. Finally, we show how the DAG can create hypothetical synthetic datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes