Sensitivity analysis in differentially private machine learning using hybrid automatic differentiation
This work addresses the problem of principled privacy analysis for data processing tasks in machine learning, representing an incremental advancement in automatic sensitivity analysis tools.
The paper tackles the challenge of analyzing individual privacy loss in differentially private machine learning by introducing a hybrid automatic differentiation system that combines reverse-mode efficiency with closed-form expression capability, enabling modeling of sensitivity for arbitrary differentiable function compositions like neural network training on private data.
In recent years, formal methods of privacy protection such as differential privacy (DP), capable of deployment to data-driven tasks such as machine learning (ML), have emerged. Reconciling large-scale ML with the closed-form reasoning required for the principled analysis of individual privacy loss requires the introduction of new tools for automatic sensitivity analysis and for tracking an individual's data and their features through the flow of computation. For this purpose, we introduce a novel \textit{hybrid} automatic differentiation (AD) system which combines the efficiency of reverse-mode AD with an ability to obtain a closed-form expression for any given quantity in the computational graph. This enables modelling the sensitivity of arbitrary differentiable function compositions, such as the training of neural networks on private data. We demonstrate our approach by analysing the individual DP guarantees of statistical database queries. Moreover, we investigate the application of our technique to the training of DP neural networks. Our approach can enable the principled reasoning about privacy loss in the setting of data processing, and further the development of automatic sensitivity analysis and privacy budgeting systems.