Multiserver-job Response Time under Multilevel Scaling
This work addresses performance modeling for cloud computing and data centers, providing insights into job scheduling and resource allocation, but it is incremental as it builds on existing multiserver-job theory with specific scaling assumptions.
The paper tackles the problem of analyzing response times in multiserver-job systems under multilevel scaling, where load approaches capacity faster than server growth, focusing on a '1 and n' system with jobs requiring one or all servers. It characterizes asymptotic growth rates for stability boundaries and mean queue length across three load regimes, showing that mean queue length peaks near balanced load through theoretical, numerical, and simulation results.
We study the multiserver-job setting in the load-focused multilevel scaling limit, where system load approaches capacity much faster than the growth of the number of servers $n$. We consider the ``1 and $n$'' system, where each job requires either one server or all $n$. Within the multilevel scaling limit, we examine three regimes: load dominated by $n$-server jobs, 1-server jobs, or balanced. In each regime, we characterize the asymptotic growth rate of the boundary of the stability region and the scaled mean queue length. We demonstrate that mean queue length peaks near balanced load via theory, numerics, and simulation.