LG CROct 7, 2019

Differential Privacy-enabled Federated Learning for Sensitive Health Data

Olivia Choudhury, Aris Gkoulalas-Divanis, Theodoros Salonidis, Issa Sylla, Yoonyoung Park, Grace Hsu, Amar Das

arXiv:1910.02578v320.2211 citationsh-index: 32

Originality Incremental advance

AI Analysis

This addresses privacy and data silo challenges for healthcare applications, but it is incremental as it combines existing federated learning and differential privacy techniques.

The paper tackled the problem of training machine learning models on sensitive health data distributed across multiple sites without sharing raw data, by introducing a federated learning framework with differential privacy, and demonstrated its feasibility and effectiveness on real-world data from 1 million patients while maintaining model utility.

Leveraging real-world health data for machine learning tasks requires addressing many practical challenges, such as distributed data silos, privacy concerns with creating a centralized database from person-specific sensitive data, resource constraints for transferring and integrating data from multiple sites, and risk of a single point of failure. In this paper, we introduce a federated learning framework that can learn a global model from distributed health data held locally at different sites. The framework offers two levels of privacy protection. First, it does not move or share raw data across sites or with a centralized server during the model training process. Second, it uses a differential privacy mechanism to further protect the model from potential privacy attacks. We perform a comprehensive evaluation of our approach on two healthcare applications, using real-world electronic health data of 1 million patients. We demonstrate the feasibility and effectiveness of the federated learning framework in offering an elevated level of privacy and maintaining utility of the global model.

View on arXiv PDF

Similar