DC LGJul 24, 2018

Supporting Very Large Models using Automatic Dataflow Graph Partitioning

Minjie Wang, Chien-chin Huang, Jinyang Li

arXiv:1807.08887v221.8173 citations

Originality Incremental advance

AI Analysis

This work addresses memory and efficiency challenges for researchers and practitioners training large-scale models, representing an incremental improvement in distributed training systems.

The paper tackles the problem of training very large deep neural network models by partitioning them across multiple GPUs to reduce memory footprint, achieving 25% to 400% speedup over alternative approaches in experiments on an 8-GPU machine.

This paper presents Tofu, a system that partitions very large DNN models across multiple GPU devices to reduce per-GPU memory footprint. Tofu is designed to partition a dataflow graph of fine-grained tensor operators in order to work transparently with a general-purpose deep learning platform like MXNet. In order to automatically partition each operator, we propose to describe the semantics of an operator in a simple language which represents tensors as lambda functions mapping from tensor coordinates to values. To optimally partition different operators in a dataflow graph, Tofu uses a recursive search algorithm that minimizes the total communication cost. Our experiments on an 8-GPU machine show that Tofu enables the training of very large CNN and RNN models. It also achieves 25% - 400% speedup over alternative approaches to train very large models.

View on arXiv PDF

Similar