Privacy-Preserving Distributed Deep Learning for Clinical Data
This work addresses privacy concerns in medical data sharing for healthcare providers and researchers, offering a provable privacy solution, though it builds incrementally on existing distributed training methods.
The paper tackles the problem of training deep learning models on clinical data across multiple institutions without compromising patient privacy, by introducing a method for distributed training under differential privacy, and demonstrates its effectiveness on two multi-site datasets, eICU and TCGA.
Deep learning with medical data often requires larger samples sizes than are available at single providers. While data sharing among institutions is desirable to train more accurate and sophisticated models, it can lead to severe privacy concerns due the sensitive nature of the data. This problem has motivated a number of studies on distributed training of neural networks that do not require direct sharing of the training data. However, simple distributed training does not offer provable privacy guarantees to satisfy technical safe standards and may reveal information about the underlying patients. We present a method to train neural networks for clinical data in a distributed fashion under differential privacy. We demonstrate these methods on two datasets that include information from multiple independent sites, the eICU collaborative Research Database and The Cancer Genome Atlas.