LGMLMay 21, 2018

SmoothOut: Smoothing Out Sharp Minima to Improve Generalization in Deep Learning

arXiv:1805.07898v355 citations
Originality Incremental advance
AI Analysis

This addresses a fundamental issue in deep learning training for researchers and practitioners, offering an incremental improvement over existing noise injection methods.

The paper tackles the problem of sharp minima in deep neural networks, which can lead to poor generalization, especially with large-batch SGD, and proposes SmoothOut, a framework that smooths out sharp minima by perturbing and averaging multiple copies of the network with noise injection, resulting in improved generalization across various experiments.

In Deep Learning, Stochastic Gradient Descent (SGD) is usually selected as a training method because of its efficiency; however, recently, a problem in SGD gains research interest: sharp minima in Deep Neural Networks (DNNs) have poor generalization; especially, large-batch SGD tends to converge to sharp minima. It becomes an open question whether escaping sharp minima can improve the generalization. To answer this question, we propose SmoothOut framework to smooth out sharp minima in DNNs and thereby improve generalization. In a nutshell, SmoothOut perturbs multiple copies of the DNN by noise injection and averages these copies. Injecting noises to SGD is widely used in the literature, but SmoothOut differs in lots of ways: (1) a de-noising process is applied before parameter updating; (2) noise strength is adapted to filter norm; (3) an alternative interpretation on the advantage of noise injection, from the perspective of sharpness and generalization; (4) usage of uniform noise instead of Gaussian noise. We prove that SmoothOut can eliminate sharp minima. Training multiple DNN copies is inefficient, we further propose an unbiased stochastic SmoothOut which only introduces the overhead of noise injecting and de-noising per batch. An adaptive variant of SmoothOut, AdaSmoothOut, is also proposed to improve generalization. In a variety of experiments, SmoothOut and AdaSmoothOut consistently improve generalization in both small-batch and large-batch training on the top of state-of-the-art solutions.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes