Wide Network Learning with Differential Privacy
This addresses the challenge of maintaining accuracy in privacy-sensitive applications like NLP typeahead and recommender systems, though it is incremental as it builds on existing differential privacy methods.
The paper tackles the problem of training wide neural networks with differential privacy, which typically suffer from significant accuracy loss, by exploiting gradient sparsity in embedding layers to achieve logarithmic dependence of loss on parameter count, and demonstrates this on a real-world dataset.
Despite intense interest and considerable effort, the current generation of neural networks suffers a significant loss of accuracy under most practically relevant privacy training regimes. One particularly challenging class of neural networks are the wide ones, such as those deployed for NLP typeahead prediction or recommender systems. Observing that these models share something in common--an embedding layer that reduces the dimensionality of the input--we focus on developing a general approach towards training these models that takes advantage of the sparsity of the gradients. More abstractly, we address the problem of differentially private empirical risk minimization (ERM) for models that admit sparse gradients. We demonstrate that for non-convex ERM problems, the loss is logarithmically dependent on the number of parameters, in contrast with polynomial dependence for the general case. Following the same intuition, we propose a novel algorithm for privately training neural networks. Finally, we provide an empirical study of a DP wide neural network on a real-world dataset, which has been rarely explored in the previous work.