Sample-based and Feature-based Federated Learning for Unconstrained and Constrained Nonconvex Optimization via Mini-batch SSCA
This addresses the problem of slow convergence and lack of methods for constrained optimization in federated learning for clients with sensitive data, though it appears incremental as it builds on existing techniques like SSCA and momentum SGD.
The paper tackles unconstrained and constrained nonconvex optimization in federated learning by proposing algorithms based on stochastic successive convex approximation and mini-batch techniques, showing convergence to stationary or KKT points and demonstrating advantages in convergence speed, communication, and computation costs.
Federated learning (FL) has become a hot research area in enabling the collaborative training of machine learning models among multiple clients that hold sensitive local data. Nevertheless, unconstrained federated optimization has been studied mainly using stochastic gradient descent (SGD), which may converge slowly, and constrained federated optimization, which is more challenging, has not been investigated so far. This paper investigates sample-based and feature-based federated optimization, respectively, and considers both unconstrained and constrained nonconvex problems for each of them. First, we propose FL algorithms using stochastic successive convex approximation (SSCA) and mini-batch techniques. These algorithms can adequately exploit the structures of the objective and constraint functions and incrementally utilize samples. We show that the proposed FL algorithms converge to stationary points and Karush-Kuhn-Tucker (KKT) points of the respective unconstrained and constrained nonconvex problems, respectively. Next, we provide algorithm examples with appealing computational complexity and communication load per communication round. We show that the proposed algorithm examples for unconstrained federated optimization are identical to FL algorithms via momentum SGD and provide an analytical connection between SSCA and momentum SGD. Finally, numerical experiments demonstrate the inherent advantages of the proposed algorithms in convergence speeds, communication and computation costs, and model specifications.