LGMar 14, 2018

Model-Agnostic Private Learning via Stability

Raef Bassily, Om Thakkar, Abhradeep Thakurta

arXiv:1803.05101v18.312 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of privacy-preserving machine learning for practitioners who need model-agnostic solutions, though it builds incrementally on existing stability and privacy techniques.

The paper tackles the problem of designing differentially private learning algorithms that work with any learning model, showing that for feature vectors where ensemble models make consistent predictions, accurate predictions can be generated with almost no privacy cost. The result includes the first computationally efficient construction for a label-private learner with sample complexity bounds depending only on VC dimension.

We design differentially private learning algorithms that are agnostic to the learning model. Our algorithms are interactive in nature, i.e., instead of outputting a model based on the training data, they provide predictions for a set of $m$ feature vectors that arrive online. We show that, for the feature vectors on which an ensemble of models (trained on random disjoint subsets of a dataset) makes consistent predictions, there is almost no-cost of privacy in generating accurate predictions for those feature vectors. To that end, we provide a novel coupling of the distance to instability framework with the sparse vector technique. We provide algorithms with formal privacy and utility guarantees for both binary/multi-class classification, and soft-label classification. For binary classification in the standard (agnostic) PAC model, we show how to bootstrap from our privately generated predictions to construct a computationally efficient private learner that outputs a final accurate hypothesis. Our construction - to the best of our knowledge - is the first computationally efficient construction for a label-private learner. We prove sample complexity upper bounds for this setting. As in non-private sample complexity bounds, the only relevant property of the given concept class is its VC dimension. For soft-label classification, our techniques are based on exploiting the stability properties of traditional learning algorithms, like stochastic gradient descent (SGD). We provide a new technique to boost the average-case stability properties of learning algorithms to strong (worst-case) stability properties, and then exploit them to obtain private classification algorithms. In the process, we also show that a large class of SGD methods satisfy average-case stability properties, in contrast to a smaller class of SGD methods that are uniformly stable as shown in prior work.

View on arXiv PDF

Similar