FederatedRSF : Federated Random Survival Forests for Partially Overlapping Medical Data
This work addresses the practical challenge of feature-space heterogeneity in federated survival analysis for medical institutions with privacy constraints.
FederatedRSF enables privacy-preserving survival prediction across institutions with partially overlapping feature sets by aggregating locally trained survival trees and redistributing only feature-compatible trees. On the GBSG2 breast cancer dataset, the federated model achieved performance comparable to centralized training.
Multi-center survival prediction can improve robustness and generalizability, yet privacy regulations and institutional governance often prevent pooling patient-level clinical and genomic data across institutions. In practice, deployment is further complicated by feature-space heterogeneity, in which sites collect different covariates or use different sequencing panels, resulting in only partially overlapping feature sets. We present FederatedRSF, a Python package that implements federated random survival forests, aggregating locally trained survival trees and redistributing only feature-compatible trees to each site, enabling inference with partial overlap without sharing raw data. We evaluate FederatedRSF on the GBSG2 breast cancer cohort distributed with the scikit-survival package, simulating feature heterogeneity across clients by withholding subsets of features, and assessing discrimination using Harrell's concordance index (C-Index) under repeated cross-validation and site-splits. The results demonstrated that the federated model can achieve performance comparable to that of the centralized training setting.