LG CRMay 24, 2023

Differentially-Private Decision Trees and Provable Robustness to Data Poisoning

Daniël Vos, Jelle Vos, Tianyu Li, Zekeriya Erkin, Sicco Verwer

arXiv:2305.15394v25.35 citationsHas Code

Originality Highly original

AI Analysis

This work addresses the challenge of maintaining utility while ensuring privacy and robustness in decision tree learning, which is important for applications requiring interpretable models in sensitive domains, though it is incremental in improving existing differential privacy methods.

The authors tackled the problem of decision tree learning with differential privacy, which often sacrifices utility for privacy, by proposing PrivaTree, a method that uses private histograms to achieve a significantly better privacy-utility trade-off and supports mixed data types without information leakage. They demonstrated that PrivaTree provides meaningful theoretical guarantees for robustness against backdoor attacks, with significantly better robustness compared to regular decision trees.

Decision trees are interpretable models that are well-suited to non-linear learning problems. Much work has been done on extending decision tree learning algorithms with differential privacy, a system that guarantees the privacy of samples within the training data. However, current state-of-the-art algorithms for this purpose sacrifice much utility for a small privacy benefit. These solutions create random decision nodes that reduce decision tree accuracy or spend an excessive share of the privacy budget on labeling leaves. Moreover, many works do not support continuous features or leak information about them. We propose a new method called PrivaTree based on private histograms that chooses good splits while consuming a small privacy budget. The resulting trees provide a significantly better privacy-utility trade-off and accept mixed numerical and categorical data without leaking information about numerical features. Finally, while it is notoriously hard to give robustness guarantees against data poisoning attacks, we demonstrate bounds for the expected accuracy and success rates of backdoor attacks against differentially-private learners. By leveraging the better privacy-utility trade-off of PrivaTree we are able to train decision trees with significantly better robustness against backdoor attacks compared to regular decision trees and with meaningful theoretical guarantees.

View on arXiv PDF Code

Similar