LG DC PF MLAug 13, 2019

Exploiting Parallelism Opportunities with Deep Learning Frameworks

Yu Emma Wang, Carole-Jean Wu, Xiaodong Wang, Kim Hazelwood, David Brooks

arXiv:1908.04705v27.732 citations

Originality Incremental advance

AI Analysis

It addresses performance tuning for machine learning developers, offering incremental improvements over existing methods.

This paper tackles the challenge of optimizing performance in deep learning frameworks by analyzing key design features and quantifying parallelism's role, resulting in guidelines that achieve up to 1.34x speedup over recommended settings.

State-of-the-art machine learning frameworks support a wide variety of design features to enable a flexible machine learning programming interface and to ease the programmability burden on machine learning developers. Identifying and using a performance-optimal setting in feature-rich frameworks, however, involves a non-trivial amount of performance profiling efforts and often relies on domain-specific knowledge. This paper takes a deep dive into analyzing the performance impact of key design features in a machine learning framework and quantifies the role of parallelism. The observations and insights distill into a simple set of guidelines that one can use to achieve much higher training and inference speedup. Across a diverse set of real-world deep learning models, the evaluation results show that the proposed performance tuning guidelines outperform the Intel and TensorFlow recommended settings by 1.29x and 1.34x, respectively.

View on arXiv PDF

Similar