LG AIMay 7

On Privacy Leakage in Tabular Diffusion Models: Influential Factors, Attacker Knowledge, and Metrics

Masoumeh Shafieinejad, D. B. Emerson, Behnoosh Zamanlooy, Elaheh Bassak, Fatemeh Tavakoli, Sara Kodeiri, Marcelo Lotif, Xi He

arXiv:2605.0683547.2

AI Analysis

For practitioners using tabular diffusion models, this work provides a systematic analysis of factors affecting privacy leakage and warns against over-reliance on common privacy metrics.

This work quantifies privacy leakage in tabular diffusion models using membership inference attacks, showing that attackers need not have perfect knowledge or massive resources to succeed, and reveals pitfalls of heuristic privacy metrics.

Tabular data plays an important role in many fields and industries, including those with elevated privacy considerations and risks. As such, there is a rising interest in generating high-quality synthetic proxies for real tabular data as a means of reducing privacy risk and proprietary data exposure. With tabular diffusion models (TDMs) demonstrating leading performance in synthesizing such data, understanding and measuring the privacy risks associated with these models is imperative. Leveraging state-of-the-art membership inference attacks for TDMs in both black- and white-box settings, this work quantifies the impact of training setup, synthesis choices, and attacker knowledge on privacy leakage. Moreover, the results demonstrate that adversaries need not have perfect knowledge of the training setup, identical data distributions, or massive compute resources to construct successful attacks. Finally, the pitfalls associated with applying heuristic privacy metrics, such as distance-to-closest record, are revealed.

View on arXiv PDF

Similar