LGOct 9, 2025
Unsupervised Multi-Source Federated Domain Adaptation under Domain Diversity through Group-Wise Discrepancy MinimizationLarissa Reichart, Cem Ata Baykara, Ali Burak Ünal et al.
Unsupervised multi-source domain adaptation (UMDA) aims to learn models that generalize to an unlabeled target domain by leveraging labeled data from multiple, diverse source domains. While distributed UMDA methods address privacy constraints by avoiding raw data sharing, existing approaches typically assume a small number of sources and fail to scale effectively. Increasing the number of heterogeneous domains often makes existing methods impractical, leading to high computational overhead or unstable performance. We propose GALA, a scalable and robust federated UMDA framework that introduces two key components: (1) a novel inter-group discrepancy minimization objective that efficiently approximates full pairwise domain alignment without quadratic computation; and (2) a temperature-controlled, centroid-based weighting strategy that dynamically prioritizes source domains based on alignment with the target. Together, these components enable stable and parallelizable training across large numbers of heterogeneous sources. To evaluate performance in high-diversity scenarios, we introduce Digit-18, a new benchmark comprising 18 digit datasets with varied synthetic and real-world domain shifts. Extensive experiments show that GALA consistently achieves competitive or state-of-the-art results on standard benchmarks and significantly outperforms prior methods in diverse multi-source settings where others fail to converge.
LGSep 12, 2025
Accurate and Private Diagnosis of Rare Genetic Syndromes from Facial Images with Federated Deep LearningAli Burak Ünal, Cem Ata Baykara, Peter Krawitz et al.
Machine learning has shown promise in facial dysmorphology, where characteristic facial features provide diagnostic clues for rare genetic disorders. GestaltMatcher, a leading framework in this field, has demonstrated clinical utility across multiple studies, but its reliance on centralized datasets limits further development, as patient data are siloed across institutions and subject to strict privacy regulations. We introduce a federated GestaltMatcher service based on a cross-silo horizontal federated learning framework, which allows hospitals to collaboratively train a global ensemble feature extractor without sharing patient images. Patient data are mapped into a shared latent space, and a privacy-preserving kernel matrix computation framework enables syndrome inference and discovery while safeguarding confidentiality. New participants can directly benefit from and contribute to the system by adopting the global feature extractor and kernel configuration from previous training rounds. Experiments show that the federated service retains over 90% of centralized performance and remains robust to both varying silo numbers and heterogeneous data distributions.
LGAug 11, 2025
Federated Learning for Epileptic Seizure Prediction Across Heterogeneous EEG DatasetsCem Ata Baykara, Saurav Raj Pandey, Ali Burak Ünal et al.
Developing accurate and generalizable epileptic seizure prediction models from electroencephalography (EEG) data across multiple clinical sites is hindered by patient privacy regulations and significant data heterogeneity (non-IID characteristics). Federated Learning (FL) offers a privacy-preserving framework for collaborative training, but standard aggregation methods like Federated Averaging (FedAvg) can be biased by dominant datasets in heterogeneous settings. This paper investigates FL for seizure prediction using a single EEG channel across four diverse public datasets (Siena, CHB-MIT, Helsinki, NCH), representing distinct patient populations (adult, pediatric, neonate) and recording conditions. We implement privacy-preserving global normalization and propose a Random Subset Aggregation strategy, where each client trains on a fixed-size random subset of its data per round, ensuring equal contribution during aggregation. Our results show that locally trained models fail to generalize across sites, and standard weighted FedAvg yields highly skewed performance (e.g., 89.0% accuracy on CHB-MIT but only 50.8% on Helsinki and 50.6% on NCH). In contrast, Random Subset Aggregation significantly improves performance on under-represented clients (accuracy increases to 81.7% on Helsinki and 68.7% on NCH) and achieves a superior macro-average accuracy of 77.1% and pooled accuracy of 80.0% across all sites, demonstrating a more robust and fair global model. This work highlights the potential of balanced FL approaches for building effective and generalizable seizure prediction systems in realistic, heterogeneous multi-hospital environments while respecting data privacy.
LGNov 26, 2024
Privacy-Preserving Federated Unsupervised Domain Adaptation for Regression on Small-Scale and High-Dimensional Biological DataCem Ata Baykara, Ali Burak Ünal, Nico Pfeifer et al.
Machine learning models often struggle with generalization in small, heterogeneous datasets due to domain shifts caused by variations in data collection and population differences. This challenge is particularly pronounced in biological data, where data is high-dimensional, small-scale, and decentralized across institutions. While federated domain adaptation methods (FDA) aim to address these challenges, most existing approaches rely on deep learning and focus on classification tasks, making them unsuitable for small-scale, high-dimensional applications. In this work, we propose freda, a privacy-preserving federated method for unsupervised domain adaptation in regression tasks. Unlike deep learning-based FDA approaches, freda is the first method to enable the federated training of Gaussian Processes to model complex feature relationships while ensuring complete data privacy through randomized encoding and secure aggregation. This allows for effective domain adaptation without direct access to raw data, making it well-suited for applications involving high-dimensional, heterogeneous datasets. We evaluate freda on the challenging task of age prediction from DNA methylation data, demonstrating that it achieves performance comparable to the centralized state-of-the-art method while preserving complete data privacy.