PRApr 5
A new $1/(1-Ï)$-scaling bound for multiserver queues via a leave-one-out techniqueYige Hong
Bounding the queue length in a multiserver queue is a central challenge in queueing theory. Even for the classical $G/G/n$ queue with homogeneous servers, it is highly non-trivial to derive a simple and accurate bound for the steady-state queue length that holds for all problem parameters. A recent breakthrough by Li and Goldberg (2025) establishes a universal bound of order $O(1/(1-Ï))$ that holds for any load $Ï< 1$ and any number of servers $n$. This order is tight in many well-known scaling regimes, including classical heavy-traffic, Halfin-Whitt and Nondegenerate-Slowdown. However, their bounds entail large constant factors and a highly intricate proof, suggesting room for further improvement. In this paper, we present a new universal bound of order $O(1/(1-Ï))$ for the $G/G/n$ queue. Our bound, while restricted to the light-tailed case and the first moment of the queue length, has a more interpretable and often tighter leading constant. Our proof is relatively simple, utilizing a modified $G/G/n$ queue, the stationarity of a quadratic test function, and a novel leave-one-out coupling technique. Finally, we also extend our method to $G/G/n$ queues with fully heterogeneous service-time distributions.
LGFeb 8, 2024
Unichain and Aperiodicity are Sufficient for Asymptotic Optimality of Average-Reward Restless BanditsYige Hong, Qiaomin Xie, Yudong Chen et al.
We consider the infinite-horizon, average-reward restless bandit problem in discrete time. We propose a new class of policies that are designed to drive a progressively larger subset of arms toward the optimal distribution. We show that our policies are asymptotically optimal with an $O(1/\sqrt{N})$ optimality gap for an $N$-armed problem, assuming only a unichain and aperiodicity assumption. Our approach departs from most existing work that focuses on index or priority policies, which rely on the Global Attractor Property (GAP) to guarantee convergence to the optimum, or a recently developed simulation-based policy, which requires a Synchronization Assumption (SA).
LGFeb 9, 2025
Projection-based Lyapunov method for fully heterogeneous weakly-coupled MDPsXiangcheng Zhang, Yige Hong, Weina Wang
Heterogeneity poses a fundamental challenge for many real-world large-scale decision-making problems but remains largely understudied. In this paper, we study the fully heterogeneous setting of a prominent class of such problems, known as weakly-coupled Markov decision processes (WCMDPs). Each WCMDP consists of $N$ arms (or subproblems), which have distinct model parameters in the fully heterogeneous setting, leading to the curse of dimensionality when $N$ is large. We show that, under mild assumptions, an efficiently computable policy achieves an $O(1/\sqrt{N})$ optimality gap in the long-run average reward per arm for fully heterogeneous WCMDPs as $N$ becomes large. This is the first asymptotic optimality result for fully heterogeneous average-reward WCMDPs. Our main technical innovation is the construction of projection-based Lyapunov functions that certify the convergence of rewards and costs to an optimal region, even under full heterogeneity.
LGMay 31, 2023
Restless Bandits with Average Reward: Breaking the Uniform Global Attractor AssumptionYige Hong, Qiaomin Xie, Yudong Chen et al.
We study the infinite-horizon restless bandit problem with the average reward criterion, in both discrete-time and continuous-time settings. A fundamental goal is to efficiently compute policies that achieve a diminishing optimality gap as the number of arms, $N$, grows large. Existing results on asymptotic optimality all rely on the uniform global attractor property (UGAP), a complex and challenging-to-verify assumption. In this paper, we propose a general, simulation-based framework, Follow-the-Virtual-Advice, that converts any single-armed policy into a policy for the original $N$-armed problem. This is done by simulating the single-armed policy on each arm and carefully steering the real state towards the simulated state. Our framework can be instantiated to produce a policy with an $O(1/\sqrt{N})$ optimality gap. In the discrete-time setting, our result holds under a simpler synchronization assumption, which covers some problem instances that violate UGAP. More notably, in the continuous-time setting, we do not require \emph{any} additional assumptions beyond the standard unichain condition. In both settings, our work is the first asymptotic optimality result that does not require UGAP.