DC AIFeb 27, 2024

Compass: A Decentralized Scheduler for Latency-Sensitive ML Workflows

Yuting Yang, Andrea Merlina, Weijia Song, Tiancheng Yuan, Ken Birman, Roman Vitenberg

arXiv:2402.17652v21.2h-index: 3

Originality Incremental advance

AI Analysis

This addresses scheduling bottlenecks for distributed ML applications like image and natural language processing, though it appears incremental as it builds on existing scheduling approaches.

The paper tackles the problem of scheduling latency-sensitive ML workflows in distributed GPU systems by proposing Compass, a decentralized scheduler that unifies GPU memory management and task placement. Results show significant reductions in completion times while using the same or fewer resources, with one case requiring only half the servers for the same workload.

We consider ML query processing in distributed systems where GPU-enabled workers coordinate to execute complex queries: a computing style often seen in applications that interact with users in support of image processing and natural language processing. In such systems, coscheduling of GPU memory management and task placement represents a promising opportunity. We propose Compass, a novel framework that unifies these functions to reduce job latency while using resources efficiently, placing tasks where data dependencies will be satisfied, collocating tasks from the same job (when this will not overload the host or its GPU), and efficiently managing GPU memory. Comparison with other state of the art schedulers shows a significant reduction in completion times while requiring the same amount or even fewer resources. In one case, just half the servers were needed for processing the same workload.

View on arXiv PDF

Similar