ProbSelect: Stochastic Client Selection for GPU-Accelerated Compute Devices in the 3D Continuum
This addresses the problem of efficient resource management in dynamic 3D continuum systems for federated learning practitioners, representing a novel method for a known bottleneck.
The paper tackles the challenge of client selection for GPU-ccelerated devices in federated learning across edge, cloud, and space environments by introducing ProbSelect, which improves SLO compliance by 13.77% on average and reduces computational waste by 72.5% compared to baselines.
Integration of edge, cloud and space devices into a unified 3D continuum imposes significant challenges for client selection in federated learning systems. Traditional approaches rely on continuous monitoring and historical data collection, which becomes impractical in dynamic environments where satellites and mobile devices frequently change operational conditions. Furthermore, existing solutions primarily consider CPU-based computation, failing to capture complex characteristics of GPU-accelerated training that is prevalent across the 3D continuum. This paper introduces ProbSelect, a novel approach utilizing analytical modeling and probabilistic forecasting for client selection on GPU-accelerated devices, without requiring historical data or continuous monitoring. We model client selection within user-defined SLOs. Extensive evaluation across diverse GPU architectures and workloads demonstrates that ProbSelect improves SLO compliance by 13.77% on average while achieving 72.5% computational waste reduction compared to baseline approaches.