Learning payoffs while routing in skill-based queues
This addresses routing optimization in service systems where payoff parameters are unknown, though it appears incremental as it builds on existing queueing and learning frameworks.
The paper tackles the problem of routing customers to servers with unknown payoff parameters in skill-based queueing systems, and presents a machine learning algorithm that achieves polylogarithmic regret and is asymptotically optimal up to logarithmic terms.
Motivated by applications in service systems, we consider queueing systems where each customer must be handled by a server with the right skill set. We focus on optimizing the routing of customers to servers in order to maximize the total payoff of customer--server matches. In addition, customer--server dependent payoff parameters are assumed to be unknown a priori. We construct a machine learning algorithm that adaptively learns the payoff parameters while maximizing the total payoff and prove that it achieves polylogarithmic regret. Moreover, we show that the algorithm is asymptotically optimal up to logarithmic terms by deriving a regret lower bound. The algorithm leverages the basic feasible solutions of a static linear program as the action space. The regret analysis overcomes the complex interplay between queueing and learning by analyzing the convergence of the queue length process to its stationary behavior. We also demonstrate the performance of the algorithm numerically, and have included an experiment with time-varying parameters highlighting the potential of the algorithm in non-static environments.