DCLGJan 18, 2024

Towards providing reliable job completion time predictions using PCS

arXiv:2401.10354v1Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of unpredictable job completion times for cloud users, offering a practical solution that balances trade-offs, though it is incremental in nature.

The paper tackles the challenge of providing reliable job completion time predictions in cloud scheduling, which often conflicts with performance and fairness. It introduces PCS, a scheduling framework that uses Weighted-Fair-Queueing and simulation-aided search to balance these objectives, achieving accurate predictions with only marginal compromises in performance and fairness.

In this paper we build a case for providing job completion time predictions to cloud users, similar to the delivery date of a package or arrival time of a booked ride. Our analysis reveals that providing predictability can come at the expense of performance and fairness. Existing cloud scheduling systems optimize for extreme points in the trade-off space, making them either extremely unpredictable or impractical. To address this challenge, we present PCS, a new scheduling framework that aims to provide predictability while balancing other traditional objectives. The key idea behind PCS is to use Weighted-Fair-Queueing (WFQ) and find a suitable configuration of different WFQ parameters (e.g., class weights) that meets specific goals for predictability. It uses a simulation-aided search strategy, to efficiently discover WFQ configurations that lie on the Pareto front of the trade-off space between these objectives. We implement and evaluate PCS in the context of DNN job scheduling on GPUs. Our evaluation, on a small scale GPU testbed and larger-scale simulations, shows that PCS can provide accurate completion time estimates while marginally compromising on performance and fairness.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes