Sparse Mean Field Load Balancing in Large Localized Queueing Systems
This work addresses scalable load balancing for cloud networks and data centers, particularly in sparsely connected wireless environments, offering a tractable solution with theoretical guarantees, though it appears incremental as it builds on sparse mean field theory.
The authors tackled the problem of scalable load balancing in large queueing networks with strong locality, which existing mean field methods could not model well, by developing a sparse mean field control framework that reduces the multi-agent problem to a single-agent one and demonstrated competitive performance on realistic wireless topologies compared to heuristics and scalable multi-agent reinforcement learning methods.
Scalable load balancing algorithms are of great interest in cloud networks and data centers, necessitating the use of tractable techniques to compute optimal load balancing policies for good performance. However, most existing scalable techniques, especially asymptotically scaling methods based on mean field theory, have not been able to model large queueing networks with strong locality. Meanwhile, general multi-agent reinforcement learning techniques can be hard to scale and usually lack a theoretical foundation. In this work, we address this challenge by leveraging recent advances in sparse mean field theory to learn a near-optimal load balancing policy in sparsely connected queueing networks in a tractable manner, which may be preferable to global approaches in terms of wireless communication overhead. Importantly, we obtain a general load balancing framework for a large class of sparse bounded-degree wireless topologies. By formulating a novel mean field control problem in the context of graphs with bounded degree, we reduce the otherwise difficult multi-agent problem to a single-agent problem. Theoretically, the approach is justified by approximation guarantees. Empirically, the proposed methodology performs well on several realistic and scalable wireless network topologies as compared to a number of well-known load balancing heuristics and existing scalable multi-agent reinforcement learning methods.