Compressed Proximal Federated Learning for Non-Convex Composite Optimization on Heterogeneous Data
This work addresses the problem of efficient and robust federated learning for models with structural constraints on distributed edge networks, which is an incremental improvement for practitioners in federated learning.
This paper tackles the challenge of communication efficiency and convergence robustness in Federated Composite Optimization (FCO) for non-convex problems with heterogeneous data. The proposed FedCEF algorithm maintains competitive model accuracy even with extreme compression ratios (e.g., 1%), significantly reducing total communication volume compared to uncompressed baselines.
Federated Composite Optimization (FCO) has emerged as a promising framework for training models with structural constraints (e.g., sparsity) in distributed edge networks. However, simultaneously achieving communication efficiency and convergence robustness remains a significant challenge, particularly when dealing with non-smooth regularizers, statistical heterogeneity, and the restrictions of biased compression. To address these issues, we propose FedCEF (Federated Composite Error Feedback), a novel algorithm tailored for non-convex FCO. FedCEF introduces a decoupled proximal update scheme that separates the proximal operator from communication, enabling clients to handle non-smooth terms locally while transmitting compressed information. To mitigate the noise from aggressive quantization and the bias from non-IID data, FedCEF integrates a rigorous error feedback mechanism with control variates. Furthermore, we design a communication-efficient pre-proximal downlink strategy that allows clients to exactly reconstruct global control variables without explicit transmission. We theoretically establish that FedCEF achieves sublinear convergence to a bounded residual error under general non-convexity, which is controllable via the step size and batch size. Extensive experiments on real datasets validate FedCEF maintains competitive model accuracy even under extreme compression ratios (e.g., 1%), significantly reducing the total communication volume compared to uncompressed baselines.