LGOCPRFeb 9, 2025

Projection-based Lyapunov method for fully heterogeneous weakly-coupled MDPs

arXiv:2502.06072v51 citationsh-index: 5
Originality Highly original
AI Analysis

This addresses scalability issues in large-scale decision-making problems for applications like resource allocation, providing the first asymptotic optimality result for fully heterogeneous average-reward WCMDPs.

The paper tackles the challenge of heterogeneity in weakly-coupled Markov decision processes (WCMDPs) with distinct model parameters across arms, showing that an efficiently computable policy achieves an O(1/√N) optimality gap in long-run average reward per arm as N becomes large.

Heterogeneity poses a fundamental challenge for many real-world large-scale decision-making problems but remains largely understudied. In this paper, we study the fully heterogeneous setting of a prominent class of such problems, known as weakly-coupled Markov decision processes (WCMDPs). Each WCMDP consists of $N$ arms (or subproblems), which have distinct model parameters in the fully heterogeneous setting, leading to the curse of dimensionality when $N$ is large. We show that, under mild assumptions, an efficiently computable policy achieves an $O(1/\sqrt{N})$ optimality gap in the long-run average reward per arm for fully heterogeneous WCMDPs as $N$ becomes large. This is the first asymptotic optimality result for fully heterogeneous average-reward WCMDPs. Our main technical innovation is the construction of projection-based Lyapunov functions that certify the convergence of rewards and costs to an optimal region, even under full heterogeneity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes