DCAILGJul 24, 2018

An argument in favor of strong scaling for deep neural networks with small datasets

arXiv:1807.09161v31 citations
Originality Incremental advance
AI Analysis

This addresses a critical bottleneck for researchers and practitioners working with limited data, offering a practical solution to improve training efficiency without sacrificing performance.

The paper tackles the problem of parallelizing deep neural networks when only small datasets are available, showing that weak scaling fails to converge or match sequential accuracy, while strong scaling achieves identical accuracy with good scalability up to 32 GPUs.

In recent years, with the popularization of deep learning frameworks and large datasets, researchers have started parallelizing their models in order to train faster. This is crucially important, because they typically explore many hyperparameters in order to find the best ones for their applications. This process is time consuming and, consequently, speeding up training improves productivity. One approach to parallelize deep learning models followed by many researchers is based on weak scaling. The minibatches increase in size as new GPUs are added to the system. In addition, new learning rates schedules have been proposed to fix optimization issues that occur with large minibatch sizes. In this paper, however, we show that the recommendations provided by recent work do not apply to models that lack large datasets. In fact, we argument in favor of using strong scaling for achieving reliable performance in such cases. We evaluated our approach with up to 32 GPUs and show that weak scaling not only does not have the same accuracy as the sequential model, it also fails to converge most of time. Meanwhile, strong scaling has good scalability while having exactly the same accuracy of a sequential implementation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes