Learning to Schedule in Parallel-Server Queues with Stochastic Bilinear Rewards
This addresses resource allocation in network systems, providing a method to balance reward maximization with queue stability, though it appears incremental as it builds on existing bandit and scheduling approaches.
The paper tackles scheduling in parallel-server queues with uncertain job-server rewards, aiming to maximize cumulative reward while keeping holding costs bounded for system stability. Their proposed algorithm achieves sub-linear regret and holding cost bounds of Õ(√T) over time horizon T.
We consider the problem of scheduling in multi-class, parallel-server queuing systems with uncertain rewards from job-server assignments. In this scenario, jobs incur holding costs while awaiting completion, and job-server assignments yield observable stochastic rewards with unknown mean values. The mean rewards for job-server assignments are assumed to follow a bilinear model with respect to features that characterize jobs and servers. Our objective is to minimize regret by maximizing the cumulative reward of job-server assignments over a time horizon, while keeping the total job holding cost bounded to ensure the stability of the queueing system. This problem is motivated by applications requiring resource allocation in network systems. In this problem, it is essential to control the tradeoff between reward maximization and fair allocation for the stability of the underlying queuing system (i.e., maximizing network throughput). To address this problem, we propose a scheduling algorithm based on a weighted proportional fair criteria augmented with marginal costs for reward maximization, incorporating a bandit algorithm tailored for bilinear rewards. Our algorithm achieves a sub-linear regret bound and a sub-linear mean holding cost (and queue length bound) of $\tilde{O}(\sqrt{T})$, respectively, with respect to the time horizon $T$, thus guaranteeing queuing system stability. Additionally, we establish stability conditions for distributed iterative algorithms for computing allocations, which are relevant to large-scale system applications. Finally, we demonstrate the efficiency of our algorithm through numerical experiments.