Differentially Private Non Parametric Copulas: Generating synthetic data with non parametric copulas under privacy guarantees
This work addresses privacy concerns in synthetic data generation for mixed tabular databases, representing an incremental improvement over existing methods.
The paper tackled the problem of generating synthetic data with privacy guarantees by enhancing a non-parametric copula model with differential privacy, resulting in DPNPC outperforming other models in preserving multivariate dependencies and reducing training times for small ε values.
Creation of synthetic data models has represented a significant advancement across diverse scientific fields, but this technology also brings important privacy considerations for users. This work focuses on enhancing a non-parametric copula-based synthetic data generation model, DPNPC, by incorporating Differential Privacy through an Enhanced Fourier Perturbation method. The model generates synthetic data for mixed tabular databases while preserving privacy. We compare DPNPC with three other models (PrivBayes, DP-Copula, and DP-Histogram) across three public datasets, evaluating privacy, utility, and execution time. DPNPC outperforms others in modeling multivariate dependencies, maintaining privacy for small $ε$ values, and reducing training times. However, limitations include the need to assess the model's performance with different encoding methods and consider additional privacy attacks. Future research should address these areas to enhance privacy-preserving synthetic data generation.