Reinforcement Learning and Regret Bounds for Admission Control
This work addresses the challenge of applying reinforcement learning to queuing systems with exponential diameter, offering improved regret bounds for admission control problems in operations research and service management.
The paper tackles the problem of high regret bounds in reinforcement learning for admission control in queuing systems, specifically an M/M/c/S queue with multiple job classes, by proposing an algorithm that leverages problem structure to achieve a regret upper bound of O(S log T + √(mT log T)) in the finite server case, and shows that dependence on buffer size S disappears in the infinite server case.
The expected regret of any reinforcement learning algorithm is lower bounded by $Ω\left(\sqrt{DXAT}\right)$ for undiscounted returns, where $D$ is the diameter of the Markov decision process, $X$ the size of the state space, $A$ the size of the action space and $T$ the number of time steps. However, this lower bound is general. A smaller regret can be obtained by taking into account some specific knowledge of the problem structure. In this article, we consider an admission control problem to an $M/M/c/S$ queue with $m$ job classes and class-dependent rewards and holding costs. Queuing systems often have a diameter that is exponential in the buffer size $S$, making the previous lower bound prohibitive for any practical use. We propose an algorithm inspired by UCRL2, and use the structure of the problem to upper bound the expected total regret by $O(S\log T + \sqrt{mT \log T})$ in the finite server case. In the infinite server case, we prove that the dependence of the regret on $S$ disappears.