LGQMMay 21

FederatedRSF : Federated Random Survival Forests for Partially Overlapping Medical Data

arXiv:2605.229547.1
Predicted impact top 94% in LG · last 90 daysOriginality Synthesis-oriented
AI Analysis

This work addresses the practical challenge of feature-space heterogeneity in federated survival analysis for medical institutions with privacy constraints.

FederatedRSF enables privacy-preserving survival prediction across institutions with partially overlapping feature sets by aggregating locally trained survival trees and redistributing only feature-compatible trees. On the GBSG2 breast cancer dataset, the federated model achieved performance comparable to centralized training.

Multi-center survival prediction can improve robustness and generalizability, yet privacy regulations and institutional governance often prevent pooling patient-level clinical and genomic data across institutions. In practice, deployment is further complicated by feature-space heterogeneity, in which sites collect different covariates or use different sequencing panels, resulting in only partially overlapping feature sets. We present FederatedRSF, a Python package that implements federated random survival forests, aggregating locally trained survival trees and redistributing only feature-compatible trees to each site, enabling inference with partial overlap without sharing raw data. We evaluate FederatedRSF on the GBSG2 breast cancer cohort distributed with the scikit-survival package, simulating feature heterogeneity across clients by withholding subsets of features, and assessing discrimination using Harrell's concordance index (C-Index) under repeated cross-validation and site-splits. The results demonstrated that the federated model can achieve performance comparable to that of the centralized training setting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes