LGNov 23, 2020

Integrating Deep Learning in Domain Sciences at Exascale

arXiv:2011.11188v15 citationsHas Code
AI Analysis

This paper is significant for domain scientists and HPC practitioners who need to integrate deep learning into existing high-performance computing simulations, offering an incremental solution through a new framework.

This paper addresses challenges in integrating deep learning with high-performance computing (HPC) simulations, proposing asynchronous parallelization and optimization techniques for large-scale heterogeneous and exascale systems. These techniques are implemented in MagmaDNN, an open-source HPC deep learning framework, which aims to provide better integration with existing HPC workflows than frameworks targeted at data scientists.

This paper presents some of the current challenges in designing deep learning artificial intelligence (AI) and integrating it with traditional high-performance computing (HPC) simulations. We evaluate existing packages for their ability to run deep learning models and applications on large-scale HPC systems efficiently, identify challenges, and propose new asynchronous parallelization and optimization techniques for current large-scale heterogeneous systems and upcoming exascale systems. These developments, along with existing HPC AI software capabilities, have been integrated into MagmaDNN, an open-source HPC deep learning framework. Many deep learning frameworks are targeted at data scientists and fall short in providing quality integration into existing HPC workflows. This paper discusses the necessities of an HPC deep learning framework and how those needs can be provided (e.g., as in MagmaDNN) through a deep integration with existing HPC libraries, such as MAGMA and its modular memory management, MPI, CuBLAS, CuDNN, MKL, and HIP. Advancements are also illustrated through the use of algorithmic enhancements in reduced- and mixed-precision, as well as asynchronous optimization methods. Finally, we present illustrations and potential solutions for enhancing traditional compute- and data-intensive applications at ORNL and UTK with AI. The approaches and future challenges are illustrated in materials science, imaging, and climate applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes