LG CR MLDec 17, 2018

Noninteractive Locally Private Learning of Linear Models via Polynomial Approximations

arXiv:1812.06825v38.724 citations

Originality Incremental advance

AI Analysis

This work addresses privacy-preserving machine learning for distributed data, offering incremental improvements in sample complexity for specific loss functions in the local model.

The paper tackles the problem of convex optimization with local differential privacy, specifically for noninteractive protocols where users submit a single randomized report. It presents new algorithms for generalized linear losses and the Euclidean median problem, achieving sample complexities that are linear or quasipolynomial in dimensionality, marking the first sub-exponential dependence for these loss types.

Minimizing a convex risk function is the main step in many basic learning algorithms. We study protocols for convex optimization which provably leak very little about the individual data points that constitute the loss function. Specifically, we consider differentially private algorithms that operate in the local model, where each data record is stored on a separate user device and randomization is performed locally by those devices. We give new protocols for \emph{noninteractive} LDP convex optimization---i.e., protocols that require only a single randomized report from each user to an untrusted aggregator. We study our algorithms' performance with respect to expected loss---either over the data set at hand (empirical risk) or a larger population from which our data set is assumed to be drawn. Our error bounds depend on the form of individuals' contribution to the expected loss. For the case of \emph{generalized linear losses} (such as hinge and logistic losses), we give an LDP algorithm whose sample complexity is only linear in the dimensionality $p$ and quasipolynomial in other terms (the privacy parameters $ε$ and $δ$, and the desired excess risk $α$). This is the first algorithm for nonsmooth losses with sub-exponential dependence on $p$. For the Euclidean median problem, where the loss is given by the Euclidean distance to a given data point, we give a protocol whose sample complexity grows quasipolynomially in $p$. This is the first protocol with sub-exponential dependence on $p$ for a loss that is not a generalized linear loss . Our result for the hinge loss is based on a technique, dubbed polynomial of inner product approximation, which may be applicable to other problems. Our results for generalized linear losses and the Euclidean median are based on new reductions to the case of hinge loss.

View on arXiv PDF

Similar