LGMLJan 7, 2020

Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes Well

arXiv:2001.02312v178 citations
AI Analysis

This addresses the challenge of reducing training time for deep learning practitioners without sacrificing model performance, though it appears incremental as it builds on existing weight averaging techniques.

The paper tackles the problem of accelerating deep neural network training while maintaining generalization by proposing Stochastic Weight Averaging in Parallel (SWAP), which uses large mini-batches for fast approximate solutions and refines them through parallel weight averaging, resulting in models that generalize as well as those trained with small mini-batches but in substantially shorter time, as demonstrated on CIFAR10, CIFAR100, and ImageNet datasets.

We propose Stochastic Weight Averaging in Parallel (SWAP), an algorithm to accelerate DNN training. Our algorithm uses large mini-batches to compute an approximate solution quickly and then refines it by averaging the weights of multiple models computed independently and in parallel. The resulting models generalize equally well as those trained with small mini-batches but are produced in a substantially shorter time. We demonstrate the reduction in training time and the good generalization performance of the resulting models on the computer vision datasets CIFAR10, CIFAR100, and ImageNet.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes