LGAIMar 30, 2022

Generation and Simulation of Synthetic Datasets with Copulas

arXiv:2203.17250v113 citations
Originality Incremental advance
AI Analysis

This addresses the need for high-quality synthetic data generation in data science, though it appears incremental as it builds on established copula models.

The paper tackles the problem of generating synthetic datasets that accurately resemble real data in both marginal and joint distributions, proposing a new copula-based method that outperforms existing techniques like SMOTE and autoencoders on two datasets.

This paper proposes a new method to generate synthetic data sets based on copula models. Our goal is to produce surrogate data resembling real data in terms of marginal and joint distributions. We present a complete and reliable algorithm for generating a synthetic data set comprising numeric or categorical variables. Applying our methodology to two datasets shows better performance compared to other methods such as SMOTE and autoencoders.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes