DCAILGAug 23, 2022

Asynchronous Execution of Heterogeneous Tasks in ML-driven HPC Workflows

arXiv:2208.11069v22 citationsh-index: 47
Originality Incremental advance
AI Analysis

This addresses resource efficiency for HPC users, but it is incremental as it builds on existing workflow scheduling concepts.

The paper tackled the problem of improving resource utilization and reducing makespan in ML-driven HPC workflows by investigating asynchronous execution of heterogeneous tasks, showing that performance enhancements are consistent with their model in experiments on Summit.

Heterogeneous scientific workflows consist of numerous types of tasks that require executing on heterogeneous resources. Asynchronous execution of those tasks is crucial to improve resource utilization, task throughput and reduce workflows' makespan. Therefore, middleware capable of scheduling and executing different task types across heterogeneous resources must enable asynchronous execution of tasks. In this paper, we investigate the requirements and properties of the asynchronous task execution of machine learning (ML)-driven high performance computing (HPC) workflows. We model the degree of asynchronicity permitted for arbitrary workflows and propose key metrics that can be used to determine qualitative benefits when employing asynchronous execution. Our experiments represent relevant scientific drivers, we perform them at scale on Summit, and we show that the performance enhancements due to asynchronous execution are consistent with our model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes