A Stackelberg Game Framework with Drainability Guardrails for Pricing and Scaling in Multi-Tenant GPU Cloud Platforms

Junji Yan, Asrin Efe Yorulmaz, Hanchen Zhou, Tamer Başar

arXiv:2604.1680260.6h-index: 9

AI Analysis

For operators of multi-tenant GPU cloud platforms, this work provides a theoretical framework and practical guardrail to prevent instability in dynamic pricing and scaling, though the contribution is incremental as it extends Stackelberg game theory to this specific domain.

The paper addresses the joint pricing and scaling problem in multi-tenant GPU cloud platforms to meet latency SLOs while controlling costs. It identifies a structural failure mode where delay-insensitive workloads cause undrainable backlog and proposes a drainability guardrail that ensures stability, with empirical results showing improved safety and robustness for model-free RL.

Modern Graphics Processing Unit (GPU)-backed services must satisfy strict latency service-level objectives (SLOs) while controlling spare-capacity cost. In multi-tenant GPU cloud platforms, this trade-off is inherently dynamic because workload demand is endogenous; specifically, pricing shapes the submissions of heterogeneous tenants, which subsequently impact congestion and delay. We formulate the joint pricing-and-scaling problem as a large-population Stackelberg game problem, and we derive an explicit equilibrium demand map. The resulting closed-loop model reveals a structural failure mode in which delay-insensitive workloads sustain a residual demand floor, making the backlog undrainable under bounded price and service capacity. This observation motivates a computable drainability guardrail that certifies uniformly negative drift in the residual-demand regime. For any fixed price-capacity pair satisfying the drainability guardrail, we establish a unique operating point and global convergence towards it under a checkable step-size condition. Building on this fixed-pair analysis, we further develop an optimizer-agnostic action shield for the full dynamic problem and show empirically that it improves safety and robustness for model-free reinforcement learning (RL) in this setting.

View on arXiv PDF

Similar