APPFL: Open-Source Software Framework for Privacy-Preserving Federated Learning
This framework addresses the need for flexible and scalable privacy-preserving FL tools, particularly in domains like biomedicine and smart grid where data sharing is restricted, though it is incremental as it builds on existing FL research.
The authors introduced APPFL, an open-source software framework for privacy-preserving federated learning, which enables users to implement and deploy FL algorithms with privacy techniques and includes a new communication-efficient algorithm that reduces server-client communication compared to state-of-the-art methods.
Federated learning (FL) enables training models at different sites and updating the weights from the training instead of transferring data to a central location and training as in classical machine learning. The FL capability is especially important to domains such as biomedicine and smart grid, where data may not be shared freely or stored at a central location because of policy challenges. Thanks to the capability of learning from decentralized datasets, FL is now a rapidly growing research field, and numerous FL frameworks have been developed. In this work, we introduce APPFL, the Argonne Privacy-Preserving Federated Learning framework. APPFL allows users to leverage implemented privacy-preserving algorithms, implement new algorithms, and simulate and deploy various FL algorithms with privacy-preserving techniques. The modular framework enables users to customize the components for algorithms, privacy, communication protocols, neural network models, and user data. We also present a new communication-efficient algorithm based on an inexact alternating direction method of multipliers. The algorithm requires significantly less communication between the server and the clients than does the current state of the art. We demonstrate the computational capabilities of APPFL, including differentially private FL on various test datasets and its scalability, by using multiple algorithms and datasets on different computing environments.