Federated singular value decomposition for high dimensional data
This work addresses privacy and computational issues for institutions like hospitals handling sensitive biomedical data, though it is incremental as it adapts existing SVD methods to federated constraints.
The authors tackled the challenge of performing singular value decomposition (SVD) on high-dimensional data in federated learning settings, such as genome-wide association studies, by developing a federated SVD algorithm that reduces transmission costs to be independent of sample size and weakly dependent on feature count.
Federated learning (FL) is emerging as a privacy-aware alternative to classical cloud-based machine learning. In FL, the sensitive data remains in data silos and only aggregated parameters are exchanged. Hospitals and research institutions which are not willing to share their data can join a federated study without breaching confidentiality. In addition to the extreme sensitivity of biomedical data, the high dimensionality poses a challenge in the context of federated genome-wide association studies (GWAS). In this article, we present a federated singular value decomposition (SVD) algorithm, suitable for the privacy-related and computational requirements of GWAS. Notably, the algorithm has a transmission cost independent of the number of samples and is only weakly dependent on the number of features, because the singular vectors associated with the samples are never exchanged and the vectors associated with the features only for a fixed number of iterations. Although motivated by GWAS, the algorithm is generically applicable for both horizontally and vertically partitioned data.