CVSep 22, 2016

Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability

arXiv:1609.06870v411.6104 citations

Originality Synthesis-oriented

AI Analysis

This addresses scalability issues for researchers and practitioners in machine learning, but it is incremental as it builds on existing distributed training methods.

The paper analyzes bottlenecks in distributed training of deep neural networks, finding that data-parallel SGD becomes communication-bound and has fixed theoretical constraints limiting scaling to a few dozen nodes, resulting in poor scalability in practice.

This paper presents a theoretical analysis and practical evaluation of the main bottlenecks towards a scalable distributed solution for the training of Deep Neuronal Networks (DNNs). The presented results show, that the current state of the art approach, using data-parallelized Stochastic Gradient Descent (SGD), is quickly turning into a vastly communication bound problem. In addition, we present simple but fixed theoretic constraints, preventing effective scaling of DNN training beyond only a few dozen nodes. This leads to poor scalability of DNN training in most practical scenarios.

View on arXiv PDF

Similar