DCAINIOct 27, 2021

Towards Intelligent Load Balancing in Data Centers

arXiv:2110.15788v1
Originality Incremental advance
AI Analysis

This work addresses the problem of inefficient load balancing in data centers for service providers, but it is incremental as it builds on existing ML approaches while highlighting unresolved challenges.

The paper tackles the challenge of applying machine learning to network load balancing in data centers by proposing Aquarius, a system that bridges ML and networking for offline analysis and online deployment, resulting in improved load balancing performance.

Network load balancers are important components in data centers to provide scalable services. Workload distribution algorithms are based on heuristics, e.g., Equal-Cost Multi-Path (ECMP), Weighted-Cost Multi-Path (WCMP) or naive machine learning (ML) algorithms, e.g., ridge regression. Advanced ML-based approaches help achieve performance gain in different networking and system problems. However, it is challenging to apply ML algorithms on networking problems in real-life systems. It requires domain knowledge to collect features from low-latency, high-throughput, and scalable networking systems, which are dynamic and heterogenous. This paper proposes Aquarius to bridge the gap between ML and networking systems and demonstrates its usage in the context of network load balancers. This paper demonstrates its ability of conducting both offline data analysis and online model deployment in realistic systems. The results show that the ML model trained and deployed using Aquarius improves load balancing performance yet they also reveals more challenges to be resolved to apply ML for networking systems.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes