LGMSJul 28, 2022

Sequential Models in the Synthetic Data Vault

arXiv:2207.14406v125 citationsh-index: 37Has Code
Originality Incremental advance
AI Analysis

It addresses the problem of generating realistic synthetic sequential data for data scientists and researchers, offering an incremental improvement over existing methods.

The paper tackles generating synthetic sequential data by introducing a Sequential model in the Synthetic Data Vault, which uses a novel CPAR neural network and shows it learns higher-level patterns than non-sequential models like CTGAN without quality trade-offs, as measured by a new MSAS metric.

The goal of this paper is to describe a system for generating synthetic sequential data within the Synthetic data vault. To achieve this, we present the Sequential model currently in SDV, an end-to-end framework that builds a generative model for multi-sequence, real-world data. This includes a novel neural network-based machine learning model, conditional probabilistic auto-regressive (CPAR) model. The overall system and the model is available in the open source Synthetic Data Vault (SDV) library {https://github.com/sdv-dev/SDV}, along with a variety of other models for different synthetic data needs. After building the Sequential SDV, we used it to generate synthetic data and compared its quality against an existing, non-sequential generative adversarial network based model called CTGAN. To compare the sequential synthetic data against its real counterpart, we invented a new metric called Multi-Sequence Aggregate Similarity (MSAS). We used it to conclude that our Sequential SDV model learns higher level patterns than non-sequential models without any trade-offs in synthetic data quality.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes