How to Make the Gradients Small Privately: Improved Rates for Differentially Private Non-Convex Optimization
This work addresses the challenge of private optimization in non-convex settings, which is crucial for applications like training neural networks and learning dynamical systems, though it appears incremental as it builds on existing frameworks to refine rates.
The paper tackles the problem of finding approximate stationary points for non-convex loss functions under differential privacy constraints, achieving improved or optimal rates across multiple function classes, including smooth non-convex, quasar-convex, KL-condition, population loss, and generalized linear models, with concrete improvements over prior state-of-the-art.
We provide a simple and flexible framework for designing differentially private algorithms to find approximate stationary points of non-convex loss functions. Our framework is based on using a private approximate risk minimizer to "warm start" another private algorithm for finding stationary points. We use this framework to obtain improved, and sometimes optimal, rates for several classes of non-convex loss functions. First, we obtain improved rates for finding stationary points of smooth non-convex empirical loss functions. Second, we specialize to quasar-convex functions, which generalize star-convex functions and arise in learning dynamical systems and training some neural nets. We achieve the optimal rate for this class. Third, we give an optimal algorithm for finding stationary points of functions satisfying the Kurdyka-Lojasiewicz (KL) condition. For example, over-parameterized neural networks often satisfy this condition. Fourth, we provide new state-of-the-art rates for stationary points of non-convex population loss functions. Fifth, we obtain improved rates for non-convex generalized linear models. A modification of our algorithm achieves nearly the same rates for second-order stationary points of functions with Lipschitz Hessian, improving over the previous state-of-the-art for each of the above problems.