Online Learning of Weakly Coupled MDP Policies for Load Balancing and Auto Scaling
This addresses resource management in scalable systems for cloud computing or data centers, but it is incremental as it builds on existing MDP and LP methods.
The paper tackles the problem of tuning load balancers and auto scalers for dynamic resource allocation under bursty traffic by modeling it as weakly coupled Markov Decision Processes (MDPs) and solving it via a linear program (LP), with results including a more tractable relaxed LP formulation and an online two-timescale algorithm for parameter learning and policy optimization.
Load balancing and auto scaling are at the core of scalable, contemporary systems, addressing dynamic resource allocation and service rate adjustments in response to workload changes. This paper introduces a novel model and algorithms for tuning load balancers coupled with auto scalers, considering bursty traffic arriving at finite queues. We begin by presenting the problem as a weakly coupled Markov Decision Processes (MDP), solvable via a linear program (LP). However, as the number of control variables of such LP grows combinatorially, we introduce a more tractable relaxed LP formulation, and extend it to tackle the problem of online parameter learning and policy optimization using a two-timescale algorithm based on the LP Lagrangian.