LGMay 23, 2024

Distributed Harmonization: Federated Clustered Batch Effect Adjustment and Generalization

arXiv:2405.15081v33 citationsh-index: 5Has CodeKDD
Originality Incremental advance
AI Analysis

This work addresses the challenge of data harmonization in federated medical settings, offering an incremental improvement over existing methods like ComBat.

The paper tackles the problem of site-specific biases in federated medical data by proposing a Cluster ComBat harmonization algorithm that improves usability and generalization to new sites, demonstrating superiority through simulations and ADNI imaging data.

Independent and identically distributed (i.i.d.) data is essential to many data analysis and modeling techniques. In the medical domain, collecting data from multiple sites or institutions is a common strategy that guarantees sufficient clinical diversity, determined by the decentralized nature of medical data. However, data from various sites are easily biased by the local environment or facilities, thereby violating the i.i.d. rule. A common strategy is to harmonize the site bias while retaining important biological information. The ComBat is among the most popular harmonization approaches and has recently been extended to handle distributed sites. However, when faced with situations involving newly joined sites in training or evaluating data from unknown/unseen sites, ComBat lacks compatibility and requires retraining with data from all the sites. The retraining leads to significant computational and logistic overhead that is usually prohibitive. In this work, we develop a novel Cluster ComBat harmonization algorithm, which leverages cluster patterns of the data in different sites and greatly advances the usability of ComBat harmonization. We use extensive simulation and real medical imaging data from ADNI to demonstrate the superiority of the proposed approach. Our codes are provided in https://github.com/illidanlab/distributed-cluster-harmonization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes