LGCRFeb 7, 2025

Training Set Reconstruction from Differentially Private Forests: How Effective is DP?

arXiv:2502.05307v31 citationsh-index: 8
Originality Incremental advance
AI Analysis

This work addresses privacy risks for users of DP-protected machine learning models, showing that current DP implementations may be insufficient, which is incremental as it builds on known vulnerabilities.

The paper tackles the problem of training data leakage from differentially private random forests by introducing a reconstruction attack that leverages constraint programming, revealing that even with meaningful DP guarantees, portions of training data can be leaked, and fully robust forests perform no better than a constant classifier.

Recent research has shown that structured machine learning models such as tree ensembles are vulnerable to privacy attacks targeting their training data. To mitigate these risks, differential privacy (DP) has become a widely adopted countermeasure, as it offers rigorous privacy protection. In this paper, we introduce a reconstruction attack targeting state-of-the-art $ε$-DP random forests. By leveraging a constraint programming model that incorporates knowledge of the forest's structure and DP mechanism characteristics, our approach formally reconstructs the most likely dataset that could have produced a given forest. Through extensive computational experiments, we examine the interplay between model utility, privacy guarantees and reconstruction accuracy across various configurations. Our results reveal that random forests trained with meaningful DP guarantees can still leak portions of their training data. Specifically, while DP reduces the success of reconstruction attacks, the only forests fully robust to our attack exhibit predictive performance no better than a constant classifier. Building on these insights, we also provide practical recommendations for the construction of DP random forests that are more resilient to reconstruction attacks while maintaining a non-trivial predictive performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes