Disjoint Generative Models
This addresses privacy concerns in synthetic data generation for applications like tabular data analysis, though it appears incremental as it builds on existing generative models with a new partitioning and joining framework.
The paper tackles the problem of generating synthetic datasets with enhanced privacy by partitioning data into disjoint subsets, training separate generative models on each, and combining results without common identifiers, achieving significantly increased privacy at only a low utility cost.
We propose a new framework for generating cross-sectional synthetic datasets via disjoint generative models. In this paradigm, a dataset is partitioned into disjoint subsets that are supplied to separate instances of generative models. The results are then combined post hoc by a joining operation that works in the absence of common variables/identifiers. The success of the framework is demonstrated through several case studies and examples on tabular data that helps illuminate some of the design choices that one may make. The principal benefit of disjoint generative models is significantly increased privacy at only a low utility cost. Additional findings include increased effectiveness and feasibility for certain model types and the possibility for mixed-model synthesis.