NIMar 16

A Hierarchical Gradient Tracking Algorithm for Mitigating Subnet-Drift in Fog Learning Networks

Evan Chen, Shiqiang Wang, Christopher G. Brinton

arXiv:2409.1743085.4h-index: 9

AI Analysis

This addresses scalability and performance issues in federated learning for fog networks with heterogeneous data distributions, representing an incremental advancement in SD-FL methods.

The paper tackles the problem of subnet-drift in semi-decentralized federated learning (SD-FL) for fog networks by developing SD-GT, a hierarchical gradient tracking algorithm that removes gradient diversity assumptions, resulting in improved model quality and reduced communication costs compared to baselines.

Federated learning (FL) encounters scalability challenges when implemented over fog networks that do not follow FL's conventional star topology architecture. Semi-decentralized FL (SD-FL) has proposed a solution for device-to-device (D2D) enabled networks that divides model cooperation into two stages: at the lower stage, D2D communications is employed for local model aggregations within subnetworks (subnets), while the upper stage handles device-server (DS) communications for global model aggregations. However, existing SD-FL schemes are based on gradient diversity assumptions that become performance bottlenecks as data distributions become more heterogeneous. In this work, we develop semi-decentralized gradient tracking (SD-GT), the first SD-FL methodology that removes the need for such assumptions by incorporating tracking terms into device updates for each communication layer. Our analytical characterization of SD-GT reveals upper bounds on convergence for non-convex, convex, and strongly-convex problems. We show how the bounds enable the development of an optimization algorithm that navigates the performance-efficiency trade-off by tuning subnet sampling rate and D2D rounds for each global training interval. Our subsequent numerical evaluations demonstrate that SD-GT obtains substantial improvements in trained model quality and communication cost relative to baselines in SD-FL and gradient tracking on several datasets.

View on arXiv PDF

Similar