LGSep 3, 2024
Collaboratively Learning Federated Models from Noisy Decentralized DataHaoyuan Li, Mathias Funk, Nezihe Merve Gürel et al.
Federated learning (FL) has emerged as a prominent method for collaboratively training machine learning models using local data from edge devices, all while keeping data decentralized. However, accounting for the quality of data contributed by local clients remains a critical challenge in FL, as local data are often susceptible to corruption by various forms of noise and perturbations, which compromise the aggregation process and lead to a subpar global model. In this work, we focus on addressing the problem of noisy data in the input space, an under-explored area compared to the label noise. We propose a comprehensive assessment of client input in the gradient space, inspired by the distinct disparity observed between the density of gradient norm distributions of models trained on noisy and clean input data. Based on this observation, we introduce a straightforward yet effective approach to identify clients with low-quality data at the initial stage of FL. Furthermore, we propose a noise-aware FL aggregation method, namely Federated Noise-Sifting (FedNS), which can be used as a plug-in approach in conjunction with widely used FL strategies. Our extensive evaluation on diverse benchmark datasets under different federated settings demonstrates the efficacy of FedNS. Our method effortlessly integrates with existing FL strategies, enhancing the global model's performance by up to 13.68% in IID and 15.85% in non-IID settings when learning from noisy decentralized data.
LGApr 3, 2025
FAST: Federated Active Learning with Foundation Models for Communication-efficient Sampling and TrainingHaoyuan Li, Mathias Funk, Jindong Wang et al.
Federated Active Learning (FAL) has emerged as a promising framework to leverage large quantities of unlabeled data across distributed clients while preserving data privacy. However, real-world deployments remain limited by high annotation costs and communication-intensive sampling processes, particularly in a cross-silo setting, when clients possess substantial local datasets. This paper addresses the crucial question: What is the best practice to reduce communication costs in human-in-the-loop learning with minimal annotator effort? Existing FAL methods typically rely on iterative annotation processes that separate active sampling from federated updates, leading to multiple rounds of expensive communication and annotation. In response, we introduce FAST, a two-pass FAL framework that harnesses foundation models for weak labeling in a preliminary pass, followed by a refinement pass focused exclusively on the most uncertain samples. By leveraging representation knowledge from foundation models and integrating refinement steps into a streamlined workflow, FAST substantially reduces the overhead incurred by iterative active sampling. Extensive experiments on diverse medical and natural image benchmarks demonstrate that FAST outperforms existing FAL methods by an average of 4.36% while reducing communication rounds eightfold under a limited 5% labeling budget.
AIOct 16, 2025
Helmsman: Autonomous Synthesis of Federated Learning Systems via Multi-Agent CollaborationHaoyuan Li, Mathias Funk, Aaqib Saeed
Federated Learning (FL) offers a powerful paradigm for training models on decentralized data, but its promise is often undermined by the immense complexity of designing and deploying robust systems. The need to select, combine, and tune strategies for multifaceted challenges like data heterogeneity and system constraints has become a critical bottleneck, resulting in brittle, bespoke solutions. To address this, we introduce Helmsman, a novel multi-agent system that automates the end-to-end synthesis of federated learning systems from high-level user specifications. It emulates a principled research and development workflow through three collaborative phases: (1) interactive human-in-the-loop planning to formulate a sound research plan, (2) modular code generation by supervised agent teams, and (3) a closed-loop of autonomous evaluation and refinement in a sandboxed simulation environment. To facilitate rigorous evaluation, we also introduce AgentFL-Bench, a new benchmark comprising 16 diverse tasks designed to assess the system-level generation capabilities of agentic systems in FL. Extensive experiments demonstrate that our approach generates solutions competitive with, and often superior to, established hand-crafted baselines. Our work represents a significant step towards the automated engineering of complex decentralized AI systems.
HCOct 7, 2020
Sonification of Facial Actions for Musical ExpressionMathias Funk, Kazuhiro Kuwabara, Michael J. Lyons
The central role of the face in social interaction and non-verbal communication suggests we explore facial action as a means of musical expression. This paper presents the design, implementation, and preliminary studies of a novel system utilizing face detection and optic flow algorithms to associate facial movements with sound synthesis in a topographically specific fashion. We report on our experience with various gesture-to-sound mappings and applications, and describe our preliminary experiments at musical performance using the system.
HCJan 30, 2020
Visual Exploration of Movement Relatedness for Multi-species Ecology AnalysisWei Li, Mathias Funk, Jasper Eikelboom et al.
Advances in GPS telemetry technology have enabled analysis of animal movement in open areas. Ecologists today are utilizing modern analytic tools to study animal behaviors from large quantity of GPS coordinates. Analytic tools with automatic event extraction functionality can be used to investigate potential interactions between animals by locating relevant segments in movement trajectories. However, such automation can easily overlook the spatial, temporal, social context as well as potential data problems. To this end, this paper explores the visual presentations that also clarify the spatial-temporal contexts, social surroudings, as well as underlying data uncertainties of multi-species animal interactions. The outcome system presents the proximity-based, time-varying relatedness between animal entities through pairwise (PW) or individual-to-group (i-G) perspectives. Focusing on the relational aspects, we employ both static depictions and animations to communicate the travelling of individuals. Our contributions are a novel visualization system that helps investigate the subtle variations of long term spatial-temporal relatedness while considering small group patterns. Our evaluation with movement ecologists shows that the system gives them quick access to valuable clues in discovering insights into multi-species movements and signs of potential interactions.