Beyond Uniform Lipschitz Condition in Differentially Private Optimization
This work addresses a key limitation in DP-SGD theory for privacy-preserving machine learning, offering practical improvements for scenarios like training softmax layers on pre-trained networks, though it is incremental in extending existing methods.
The paper tackles the unrealistic uniform Lipschitzness assumption in differentially private stochastic gradient descent (DP-SGD) by generalizing it to allow sample-dependent and potentially unbounded per-sample Lipschitz constants, providing principled clip norm guidance for convex over-parameterized settings and verifying it experimentally on 8 datasets.
Most prior results on differentially private stochastic gradient descent (DP-SGD) are derived under the simplistic assumption of uniform Lipschitzness, i.e., the per-sample gradients are uniformly bounded. We generalize uniform Lipschitzness by assuming that the per-sample gradients have sample-dependent upper bounds, i.e., per-sample Lipschitz constants, which themselves may be unbounded. We provide principled guidance on choosing the clip norm in DP-SGD for convex over-parameterized settings satisfying our general version of Lipschitzness when the per-sample Lipschitz constants are bounded; specifically, we recommend tuning the clip norm only till values up to the minimum per-sample Lipschitz constant. This finds application in the private training of a softmax layer on top of a deep network pre-trained on public data. We verify the efficacy of our recommendation via experiments on 8 datasets. Furthermore, we provide new convergence results for DP-SGD on convex and nonconvex functions when the Lipschitz constants are unbounded but have bounded moments, i.e., they are heavy-tailed.