DC CV LGNov 12, 2017

Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes

Takuya Akiba, Shuji Suzuki, Keisuke Fukuda

arXiv:1711.04325v131.9320 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the need for faster training times in deep learning, particularly for large-scale image classification tasks, though it is incremental as it builds on existing methods with hardware scaling.

The researchers tackled the problem of training ResNet-50 on ImageNet quickly by using a large minibatch size of 32k, achieving training in 15 minutes with 1024 GPUs while maintaining accuracy through techniques like RMSprop warm-up and a slow-start learning rate schedule.

We demonstrate that training ResNet-50 on ImageNet for 90 epochs can be achieved in 15 minutes with 1024 Tesla P100 GPUs. This was made possible by using a large minibatch size of 32k. To maintain accuracy with this large minibatch size, we employed several techniques such as RMSprop warm-up, batch normalization without moving averages, and a slow-start learning rate schedule. This paper also describes the details of the hardware and software of the system used to achieve the above performance.

View on arXiv PDF

Similar