LG CR MLDec 19, 2020

Scalable and Provably Accurate Algorithms for Differentially Private Distributed Decision Tree Learning

Kaiwen Wang, Travis Dick, Maria-Florina Balcan

arXiv:2012.10602v35.85 citationsHas Code

Originality Highly original

AI Analysis

This work provides provably accurate and scalable differentially private decision tree learning algorithms for organizations needing to analyze distributed sensitive data without compromising individual privacy, representing a significant step for privacy-preserving machine learning.

This paper presents the first provably accurate algorithms for differentially private, top-down decision tree learning in a distributed setting. They propose DP-TopDown with two distributed implementations: NoisyCounts, which extends a single-machine algorithm using the Laplace mechanism, and LocalRNM, which reduces communication and noise through local optimization. The algorithms achieve utility guarantees where the error of the privately-learned decision tree approaches zero with sufficiently large datasets.

This paper introduces the first provably accurate algorithms for differentially private, top-down decision tree learning in the distributed setting (Balcan et al., 2012). We propose DP-TopDown, a general privacy preserving decision tree learning algorithm, and present two distributed implementations. Our first method NoisyCounts naturally extends the single machine algorithm by using the Laplace mechanism. Our second method LocalRNM significantly reduces communication and added noise by performing local optimization at each data holder. We provide the first utility guarantees for differentially private top-down decision tree learning in both the single machine and distributed settings. These guarantees show that the error of the privately-learned decision tree quickly goes to zero provided that the dataset is sufficiently large. Our extensive experiments on real datasets illustrate the trade-offs of privacy, accuracy and generalization when learning private decision trees in the distributed setting.

View on arXiv PDF Code

Similar