CRFeb 11, 2020

Privacy-preserving collaborative machine learning on genomic data using TensorFlow

arXiv:2002.04344v218 citations
AI Analysis

This work addresses privacy concerns for stakeholders like hospitals and healthcare companies in genomic studies, though it is incremental as it builds on existing MPC and TensorFlow frameworks.

The paper tackles the problem of enabling privacy-preserving collaborative machine learning on sensitive genomic data held by multiple stakeholders, achieving first place in the iDASH2019 secure genome analysis competition.

Machine learning (ML) methods have been widely used in genomic studies. However, genomic data are often held by different stakeholders (e.g. hospitals, universities, and healthcare companies) who consider the data as sensitive information, even though they desire to collaborate. To address this issue, recent works have proposed solutions using Secure Multi-party Computation (MPC), which train on the decentralized data in a way that the participants could learn nothing from each other beyond the final trained model. We design and implement several MPC-friendly ML primitives, including class weight adjustment and parallelizable approximation of activation function. In addition, we develop the solution as an extension to TF Encrypted~\citep{dahl2018private}, enabling us to quickly experiment with enhancements of both machine learning techniques and cryptographic protocols while leveraging the advantages of TensorFlow's optimizations. Our implementation compares favorably with state-of-the-art methods, winning first place in Track IV of the iDASH2019 secure genome analysis competition.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes