LGDCPFMLAug 13, 2019

Exploiting Parallelism Opportunities with Deep Learning Frameworks

arXiv:1908.04705v20.0032 citations
AI Analysis50

It addresses performance tuning for machine learning developers, offering incremental improvements over existing methods.

This paper tackles the challenge of optimizing performance in deep learning frameworks by analyzing key design features and quantifying parallelism's role, resulting in guidelines that achieve up to 1.34x speedup over recommended settings.

State-of-the-art machine learning frameworks support a wide variety of design features to enable a flexible machine learning programming interface and to ease the programmability burden on machine learning developers. Identifying and using a performance-optimal setting in feature-rich frameworks, however, involves a non-trivial amount of performance profiling efforts and often relies on domain-specific knowledge. This paper takes a deep dive into analyzing the performance impact of key design features in a machine learning framework and quantifies the role of parallelism. The observations and insights distill into a simple set of guidelines that one can use to achieve much higher training and inference speedup. Across a diverse set of real-world deep learning models, the evaluation results show that the proposed performance tuning guidelines outperform the Intel and TensorFlow recommended settings by 1.29x and 1.34x, respectively.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes