Differentially Private Federated Learning for Cancer Prediction
This work addresses the challenge of privacy-preserving cancer prediction for medical researchers and institutions, offering an incremental solution within a competitive benchmark.
This paper presents a federated learning approach for breast cancer prediction using genomic data from two virtual centers, incorporating differential privacy. The authors achieved 3rd place in the iDASH 2020 competition, focusing on balancing prediction performance with privacy budget constraints.
Since 2014, the NIH funded iDASH (integrating Data for Analysis, Anonymization, SHaring) National Center for Biomedical Computing has hosted yearly competitions on the topic of private computing for genomic data. For one track of the 2020 iteration of this competition, participants were challenged to produce an approach to federated learning (FL) training of genomic cancer prediction models using differential privacy (DP), with submissions ranked according to held-out test accuracy for a given set of DP budgets. More precisely, in this track, we are tasked with training a supervised model for the prediction of breast cancer occurrence from genomic data split between two virtual centers while ensuring data privacy with respect to model transfer via DP. In this article, we present our 3rd place submission to this competition. During the competition, we encountered two main challenges discussed in this article: i) ensuring correctness of the privacy budget evaluation and ii) achieving an acceptable trade-off between prediction performance and privacy budget.