LG MLNov 21, 2018

Privacy-Preserving Collaborative Prediction using Random Forests

Irene Giacomelli, Somesh Jha, Ross Kleiman, David Page, Kyonghwan Yoon

arXiv:1811.08695v16.225 citations

Originality Incremental advance

AI Analysis

This addresses privacy concerns in sensitive applications like clinical decision support from EHR data across clinics, though it is incremental as it adapts existing ensemble methods to a privacy-preserving context.

The paper tackles the problem of privacy-preserving machine learning for ensemble methods, specifically random forests, by proposing a collaborative approach where each entity trains a model locally and predictions are computed without revealing extra information, achieving high efficiency and potential accuracy benefits as demonstrated on real-world datasets including EHR data.

We study the problem of privacy-preserving machine learning (PPML) for ensemble methods, focusing our effort on random forests. In collaborative analysis, PPML attempts to solve the conflict between the need for data sharing and privacy. This is especially important in privacy sensitive applications such as learning predictive models for clinical decision support from EHR data from different clinics, where each clinic has a responsibility for its patients' privacy. We propose a new approach for ensemble methods: each entity learns a model, from its own data, and then when a client asks the prediction for a new private instance, the answers from all the locally trained models are used to compute the prediction in such a way that no extra information is revealed. We implement this approach for random forests and we demonstrate its high efficiency and potential accuracy benefit via experiments on real-world datasets, including actual EHR data.

View on arXiv PDF

Similar