ML LGOct 14, 2024

Queueing Matching Bandits with Preference Feedback

arXiv:2410.10098v212.06 citationsh-index: 1Has CodeNIPS

Originality Incremental advance

AI Analysis

This addresses queue management in systems like ride-sharing or job scheduling, but it is incremental as it builds on existing bandit algorithms like UCB and Thompson Sampling.

The paper tackles the problem of stabilizing multi-class multi-server queueing systems with unknown service rates by learning server preferences, achieving an average queue length bound of O(min{N,K}/ε) and sublinear regret bounds of Õ(min{√T Q_max, T^{3/4}}).

In this study, we consider multi-class multi-server asymmetric queueing systems consisting of $N$ queues on one side and $K$ servers on the other side, where jobs randomly arrive in queues at each time. The service rate of each job-server assignment is unknown and modeled by a feature-based Multi-nomial Logit (MNL) function. At each time, a scheduler assigns jobs to servers, and each server stochastically serves at most one job based on its preferences over the assigned jobs. The primary goal of the algorithm is to stabilize the queues in the system while learning the service rates of servers. To achieve this goal, we propose algorithms based on UCB and Thompson Sampling, which achieve system stability with an average queue length bound of $O(\min\{N,K\}/ε)$ for a large time horizon $T$, where $ε$ is a traffic slackness of the system. Furthermore, the algorithms achieve sublinear regret bounds of $\tilde{O}(\min\{\sqrt{T} Q_{\max},T^{3/4}\})$, where $Q_{\max}$ represents the maximum queue length over agents and times. Lastly, we provide experimental results to demonstrate the performance of our algorithms.

View on arXiv PDF Code

Similar