Encrypted statistical machine learning: new privacy preserving methods
This work addresses privacy-preserving machine learning for sensitive data applications, representing an incremental advancement in adapting existing methods to encrypted environments.
The authors tackled the problem of performing statistical machine learning on encrypted data using fully homomorphic encryption, proposing new methods for extremely random forests and naive Bayes that achieve competitive performance on classification datasets.
We present two new statistical machine learning methods designed to learn on fully homomorphic encrypted (FHE) data. The introduction of FHE schemes following Gentry (2009) opens up the prospect of privacy preserving statistical machine learning analysis and modelling of encrypted data without compromising security constraints. We propose tailored algorithms for applying extremely random forests, involving a new cryptographic stochastic fraction estimator, and naïve Bayes, involving a semi-parametric model for the class decision boundary, and show how they can be used to learn and predict from encrypted data. We demonstrate that these techniques perform competitively on a variety of classification data sets and provide detailed information about the computational practicalities of these and other FHE methods.