DC LG PFSep 11, 2020

Hierarchical Roofline Performance Analysis for Deep Learning Applications

Charlene Yang, Yunsong Wang, Steven Farrell, Thorsten Kurth, Samuel Williams

arXiv:2009.05257v49.223 citations

Originality Synthesis-oriented

AI Analysis

This provides a practical tool for developers and researchers optimizing deep learning applications on GPUs, though it is incremental as it extends existing tools.

The paper tackles the challenge of analyzing deep learning performance on NVIDIA GPUs by introducing a methodology for hierarchical Roofline analysis, validated on a climate image segmentation application using TensorFlow and PyTorch to show framework-specific performance differences.

This paper presents a practical methodology for collecting performance data necessary to conduct hierarchical Roofline analysis on NVIDIA GPUs. It discusses the extension of the Empirical Roofline Toolkit for broader support of a range of data precisions and Tensor Core support and introduces a Nsight Compute based method to accurately collect application performance information. This methodology allows for automated machine characterization and application characterization for Roofline analysis across the entire memory hierarchy on NVIDIA GPUs, and it is validated by a complex deep learning application used for climate image segmentation. We use two versions of the code, in TensorFlow and PyTorch respectively, to demonstrate the use and effectiveness of this methodology. We highlight how the application utilizes the compute and memory capabilities on the GPU and how the implementation and performance differ in two deep learning frameworks.

View on arXiv PDF

Similar