LGAIJan 21, 2022

Accelerate Model Parallel Training by Using Efficient Graph Traversal Order in Device Placement

arXiv:2201.09676v1
AI Analysis

This work addresses the problem of optimizing training speed in model parallelization for researchers and practitioners, but it is incremental as it builds on existing device placement methods by focusing on traversal order.

The paper investigates how different graph traversal orders affect device placement decisions in model parallel training, finding that the optimal order depends on neural network type and graph features, and provides recommendations to improve training time.

Modern neural networks require long training to reach decent performance on massive datasets. One common approach to speed up training is model parallelization, where large neural networks are split across multiple devices. However, different device placements of the same neural network lead to different training times. Most of the existing device placement solutions treat the problem as sequential decision-making by traversing neural network graphs and assigning their neurons to different devices. This work studies the impact of graph traversal order on device placement. In particular, we empirically study how different graph traversal order leads to different device placement, which in turn affects the training execution time. Our experiment results show that the best graph traversal order depends on the type of neural networks and their computation graphs features. In this work, we also provide recommendations on choosing graph traversal order in device placement for various neural network families to improve the training time in model parallelization.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes