LGDCDec 1, 2020

Python Workflows on HPC Systems

arXiv:2012.00365v13 citations
AI Analysis

This work addresses the practical challenges of deploying and managing Python workflows for Deep Learning researchers and practitioners on HPC systems, offering potential workarounds for maintaining stable and secure environments.

This paper analyzes the challenges of using Python for compute-intensive machine learning and data analytics on HPC systems, specifically focusing on Deep Learning applications on GPU clusters. It identifies key problems related to multi-user environments, parallel programming, and resource management.

The recent successes and wide spread application of compute intensive machine learning and data analytics methods have been boosting the usage of the Python programming language on HPC systems. While Python provides many advantages for the users, it has not been designed with a focus on multi-user environments or parallel programming - making it quite challenging to maintain stable and secure Python workflows on a HPC system. In this paper, we analyze the key problems induced by the usage of Python on HPC clusters and sketch appropriate workarounds for efficiently maintaining multi-user Python software environments, securing and restricting resources of Python jobs and containing Python processes, while focusing on Deep Learning applications running on GPU clusters.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes