DCLGNIJan 27, 2022

Multi-Agent Reinforcement Learning for Network Load Balancing in Data Center

arXiv:2201.11727v411 citations
AI Analysis

This work addresses load balancing challenges for data center operators, offering a more flexible and effective solution compared to existing methods, though it is incremental in applying MARL to this domain.

The paper tackles the network load balancing problem in data centers by applying multi-agent reinforcement learning (MARL) to overcome the inflexibility of traditional heuristic methods like WCMP and LSQ, achieving superior performance in realistic testbed experiments.

This paper presents the network load balancing problem, a challenging real-world task for multi-agent reinforcement learning (MARL) methods. Traditional heuristic solutions like Weighted-Cost Multi-Path (WCMP) and Local Shortest Queue (LSQ) are less flexible to the changing workload distributions and arrival rates, with a poor balance among multiple load balancers. The cooperative network load balancing task is formulated as a Dec-POMDP problem, which naturally induces the MARL methods. To bridge the reality gap for applying learning-based methods, all methods are directly trained and evaluated on an emulation system from moderate-to large-scale. Experiments on realistic testbeds show that the independent and "selfish" load balancing strategies are not necessarily the globally optimal ones, while the proposed MARL solution has a superior performance over different realistic settings. Additionally, the potential difficulties of MARL methods for network load balancing are analysed, which helps to draw the attention of the learning and network communities to such challenges.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes