MLLGNov 25, 2024

DP-CDA: An Algorithm for Enhanced Privacy Preservation in Dataset Synthesis Through Randomized Mixing

arXiv:2411.16121v21 citationsh-index: 15
AI Analysis

This addresses privacy concerns in data publishing for sectors like healthcare and finance, but appears incremental as it builds on existing methods with specific improvements.

The paper tackled the problem of privacy preservation in high-dimensional dataset synthesis by introducing DP-CDA, an algorithm that uses randomized mixing to generate synthetic data with formal privacy guarantees, achieving superior utility compared to traditional methods under the same privacy requirements.

In recent years, the growth of data across various sectors, including healthcare, security, finance, and education, has created significant opportunities for analysis and informed decision-making. However, these datasets often contain sensitive and personal information, which raises serious privacy concerns. Protecting individual privacy is crucial, yet many existing machine learning and data publishing algorithms struggle with high-dimensional data, facing challenges related to computational efficiency and privacy preservation. To address these challenges, we introduce an effective data publishing algorithm \emph{DP-CDA}. Our proposed algorithm generates synthetic datasets by randomly mixing data in a class-specific manner, and inducing carefully-tuned randomness to ensure formal privacy guarantees. Our comprehensive privacy accounting shows that DP-CDA provides a stronger privacy guarantee compared to existing methods, allowing for better utility while maintaining strict level of privacy. To evaluate the effectiveness of DP-CDA, we examine the accuracy of predictive models trained on the synthetic data, which serves as a measure of dataset utility. Importantly, we identify an optimal order of mixing that balances privacy guarantee with predictive accuracy. Our results indicate that synthetic datasets produced using the DP-CDA can achieve superior utility compared to those generated by traditional data publishing algorithms, even when subject to the same privacy requirements.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes