LGCRDec 11, 2023

Classification with Partially Private Features

arXiv:2312.07583v14 citationsh-index: 46
Originality Highly original
AI Analysis

This addresses privacy-preserving machine learning for scenarios where data has mixed sensitivity, offering a practical solution with improved performance over existing methods.

The paper tackles differentially private classification when only some features are sensitive, adapting AdaBoost to be provably private and showing it outperforms benchmarks that treat all features as sensitive, with experiments demonstrating high accuracy using randomly generated classifiers.

In this paper, we consider differentially private classification when some features are sensitive, while the rest of the features and the label are not. We adapt the definition of differential privacy naturally to this setting. Our main contribution is a novel adaptation of AdaBoost that is not only provably differentially private, but also significantly outperforms a natural benchmark that assumes the entire data of the individual is sensitive in the experiments. As a surprising observation, we show that boosting randomly generated classifiers suffices to achieve high accuracy. Our approach easily adapts to the classical setting where all the features are sensitive, providing an alternate algorithm for differentially private linear classification with a much simpler privacy proof and comparable or higher accuracy than differentially private logistic regression on real-world datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes