LGCVOct 10, 2022

FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in Realistic Healthcare Settings

ETH Zurich
arXiv:2210.04620v3212 citationsh-index: 58Has Code
AI Analysis

This addresses a critical bottleneck for researchers in healthcare AI by providing a practical benchmark to advance algorithmic development in cross-silo federated learning, though it is incremental as it builds on existing FL concepts.

The paper tackles the lack of realistic healthcare datasets for cross-silo federated learning by introducing FLamby, a suite of 7 healthcare datasets with natural splits, covering multiple tasks and modalities, and includes baseline training code and benchmarks of standard FL algorithms.

Federated Learning (FL) is a novel approach enabling several clients holding sensitive data to collaboratively train machine learning models, without centralizing data. The cross-silo FL setting corresponds to the case of few ($2$--$50$) reliable clients, each holding medium to large datasets, and is typically found in applications such as healthcare, finance, or industry. While previous works have proposed representative datasets for cross-device FL, few realistic healthcare cross-silo FL datasets exist, thereby slowing algorithmic research in this critical application. In this work, we propose a novel cross-silo dataset suite focused on healthcare, FLamby (Federated Learning AMple Benchmark of Your cross-silo strategies), to bridge the gap between theory and practice of cross-silo FL. FLamby encompasses 7 healthcare datasets with natural splits, covering multiple tasks, modalities, and data volumes, each accompanied with baseline training code. As an illustration, we additionally benchmark standard FL algorithms on all datasets. Our flexible and modular suite allows researchers to easily download datasets, reproduce results and re-use the different components for their research. FLamby is available at~\url{www.github.com/owkin/flamby}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes