FLINT: A Platform for Federated Learning Integration
This addresses the risks and costs for developers and organizations transitioning to federated learning, though it is incremental as it builds on existing FL and platform work.
The paper tackles the challenges of deploying cross-device federated learning at scale by introducing a platform that integrates with existing ML systems to measure constraints and evaluate trade-offs, with empirical evaluations on applications affecting hundreds of millions of users.
Cross-device federated learning (FL) has been well-studied from algorithmic, system scalability, and training speed perspectives. Nonetheless, moving from centralized training to cross-device FL for millions or billions of devices presents many risks, including performance loss, developer inertia, poor user experience, and unexpected application failures. In addition, the corresponding infrastructure, development costs, and return on investment are difficult to estimate. In this paper, we present a device-cloud collaborative FL platform that integrates with an existing machine learning platform, providing tools to measure real-world constraints, assess infrastructure capabilities, evaluate model training performance, and estimate system resource requirements to responsibly bring FL into production. We also present a decision workflow that leverages the FL-integrated platform to comprehensively evaluate the trade-offs of cross-device FL and share our empirical evaluations of business-critical machine learning applications that impact hundreds of millions of users.