LG AIMar 30, 2022

Generation and Simulation of Synthetic Datasets with Copulas

Regis Houssou, Mihai-Cezar Augustin, Efstratios Rappos, Vivien Bonvin, Stephan Robert-Nicoud

arXiv:2203.17250v19.613 citations

Originality Incremental advance

AI Analysis

This addresses the need for high-quality synthetic data generation in data science, though it appears incremental as it builds on established copula models.

The paper tackles the problem of generating synthetic datasets that accurately resemble real data in both marginal and joint distributions, proposing a new copula-based method that outperforms existing techniques like SMOTE and autoencoders on two datasets.

This paper proposes a new method to generate synthetic data sets based on copula models. Our goal is to produce surrogate data resembling real data in terms of marginal and joint distributions. We present a complete and reliable algorithm for generating a synthetic data set comprising numeric or categorical variables. Applying our methodology to two datasets shows better performance compared to other methods such as SMOTE and autoencoders.

View on arXiv PDF

Similar