SEJul 12, 2017

DeepProf: Performance Analysis for Deep Learning Applications via Mining GPU Execution Patterns

arXiv:1707.03750v17 citations
Originality Incremental advance
AI Analysis

This work addresses performance analysis challenges for developers and researchers using deep learning frameworks, though it is incremental as it builds on existing trace analysis methods.

The paper tackles the difficulty in analyzing deep learning application performance due to the gap between source code and GPU operations by introducing DeepProf, a tool that automatically processes GPU traces using suffix trees to extract patterns and generate performance analysis reports, with empirical verification of its effectiveness.

Deep learning applications are computation-intensive and often employ GPU as the underlying computing devices. Deep learning frameworks provide powerful programming interfaces, but the gap between source codes and practical GPU operations make it difficult to analyze the performance of deep learning applications. In this paper, through examing the features of GPU traces and deep learning applications, we use the suffix tree structure to extract the repeated patten in GPU traces. Performance analysis graphs can be generated from the preprocessed GPU traces. We further present \texttt{DeepProf}, a novel tool to automatically process GPU traces and generate performance analysis reports for deep learning applications. Empirical study verifies the effectiveness of \texttt{DeepProf} in performance analysis and diagnosis. We also find out some interesting properties of Tensorflow, which can be used to guide the deep learning system setup.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes