DCMar 20, 2025

The Merit of Simple Policies: Buying Performance With Parallelism and System Architecture

Mert Yildiz, Alexey Rolich, Andrea Baiocchi

arXiv:2503.161663 citationsh-index: 5

AI Analysis

For cloud system designers, this work demonstrates that parallelism and architecture can be more impactful than sophisticated scheduling policies, challenging conventional wisdom.

Using Google's cloud workload data, the authors show that under a fixed computational budget, mean job response time is minimized at an optimal cluster size, and simple policies like Join Idle Queue match complex size-based policies at high parallelism. Multi-stage clusters with simple Round Robin outperform size-based policies.

While scheduling and dispatching of computational workloads is a well-investigated subject, only recently has Google provided publicly a vast high-resolution measurement dataset of its cloud workloads. We revisit dispatching and scheduling algorithms fed by traffic workloads derived from those measurements. The main finding is that mean job response time attains a minimum as the number of servers of the computing cluster is varied, under the constraint that the overall computational budget is kept constant. Moreover, simple policies, such as Join Idle Queue, appear to attain the same performance as more complex, size-based policies for suitably high degrees of parallelism. Further, better performance, definitely outperforming size-based dispatching policies, is obtained by using multi-stage server clusters, even using very simple policies such as Round Robin. The takeaway is that parallelism and architecture of computing systems might be powerful knobs to control performance, even more than policies, under realistic workload traffic.

View on arXiv PDF

Similar