Private Stochastic Convex Optimization: Optimal Rates in Linear Time
This work addresses the computational inefficiency in differentially private optimization, enabling faster and more scalable privacy-preserving machine learning for applications like sensitive data analysis.
The paper tackles the problem of designing differentially private algorithms for stochastic convex optimization that achieve optimal excess population loss bounds while matching the computational efficiency of non-private methods, resulting in algorithms that use O(min{n, n^2/d}) gradient computations and maintain optimal privacy guarantees.
We study differentially private (DP) algorithms for stochastic convex optimization: the problem of minimizing the population loss given i.i.d. samples from a distribution over convex loss functions. A recent work of Bassily et al. (2019) has established the optimal bound on the excess population loss achievable given $n$ samples. Unfortunately, their algorithm achieving this bound is relatively inefficient: it requires $O(\min\{n^{3/2}, n^{5/2}/d\})$ gradient computations, where $d$ is the dimension of the optimization problem. We describe two new techniques for deriving DP convex optimization algorithms both achieving the optimal bound on excess loss and using $O(\min\{n, n^2/d\})$ gradient computations. In particular, the algorithms match the running time of the optimal non-private algorithms. The first approach relies on the use of variable batch sizes and is analyzed using the privacy amplification by iteration technique of Feldman et al. (2018). The second approach is based on a general reduction to the problem of localizing an approximately optimal solution with differential privacy. Such localization, in turn, can be achieved using existing (non-private) uniformly stable optimization algorithms. As in the earlier work, our algorithms require a mild smoothness assumption. We also give a linear-time algorithm achieving the optimal bound on the excess loss for the strongly convex case, as well as a faster algorithm for the non-smooth case.