LGMLMar 20, 2025

TVineSynth: A Truncated C-Vine Copula Generator of Synthetic Tabular Data to Balance Privacy and Utility

arXiv:2503.15972v11 citationsh-index: 45AISTATS
Originality Incremental advance
AI Analysis

This addresses the challenge of protecting sensitive information in synthetic data for applications like machine learning, though it appears incremental as it builds on existing vine copula methods.

The authors tackled the problem of generating synthetic tabular data that balances privacy and utility, proposing TVineSynth, which uses a truncated vine copula model to zero out privacy-leaking dependencies while maintaining useful ones, achieving a superior privacy-utility balance compared to competitors.

We propose TVineSynth, a vine copula based synthetic tabular data generator, which is designed to balance privacy and utility, using the vine tree structure and its truncation to do the trade-off. Contrary to synthetic data generators that achieve DP by globally adding noise, TVineSynth performs a controlled approximation of the estimated data generating distribution, so that it does not suffer from poor utility of the resulting synthetic data for downstream prediction tasks. TVineSynth introduces a targeted bias into the vine copula model that, combined with the specific tree structure of the vine, causes the model to zero out privacy-leaking dependencies while relying on those that are beneficial for utility. Privacy is here measured with membership (MIA) and attribute inference attacks (AIA). Further, we theoretically justify how the construction of TVineSynth ensures AIA privacy under a natural privacy measure for continuous sensitive attributes. When compared to competitor models, with and without DP, on simulated and on real-world data, TVineSynth achieves a superior privacy-utility balance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes