The Transient Cost of Learning in Queueing Systems
This addresses the transient performance gap in bandit learning for queueing systems, which is incremental but relevant for applications like communication networks and healthcare.
The paper tackles the problem of parameter uncertainty in queueing systems by proposing the Transient Cost of Learning in Queueing (TCLQ) metric to quantify the maximum increase in time-averaged queue length during early learning stages, characterizing it for single-queue and multi-queue systems.
Queueing systems are widely applicable stochastic models with use cases in communication networks, healthcare, service systems, etc. Although their optimal control has been extensively studied, most existing approaches assume perfect knowledge of the system parameters. This assumption rarely holds in practice where there is parameter uncertainty, thus motivating a recent line of work on bandit learning for queueing systems. This nascent stream of research focuses on the asymptotic performance of the proposed algorithms but does not provide insight on the transient performance in the early stages of the learning process. In this paper, we propose the Transient Cost of Learning in Queueing (TCLQ), a new metric that quantifies the maximum increase in time-averaged queue length caused by parameter uncertainty. We characterize the TCLQ of a single-queue multi-server system, and then extend these results to multi-queue multi-server systems and networks of queues. In establishing our results, we propose a unified analysis framework for TCLQ that bridges Lyapunov and bandit analysis, provides guarantees for a wide range of algorithms, and could be of independent interest.