LGCVOct 27, 2021

Multilayer Lookahead: a Nested Version of Lookahead

arXiv:2110.14254v12 citations
Originality Incremental advance
AI Analysis

This work addresses optimization challenges in deep learning, offering incremental improvements for training neural networks.

The paper tackles the problem of improving optimization for deep neural networks by proposing Multilayer Lookahead, a nested version of the Lookahead optimizer, which achieves better performance than Lookahead on CIFAR-10, CIFAR-100, and MNIST GAN tasks.

In recent years, SGD and its variants have become the standard tool to train Deep Neural Networks. In this paper, we focus on the recently proposed variant Lookahead, which improves upon SGD in a wide range of applications. Following this success, we study an extension of this algorithm, the \emph{Multilayer Lookahead} optimizer, which recursively wraps Lookahead around itself. We prove the convergence of Multilayer Lookahead with two layers to a stationary point of smooth non-convex functions with $O(\frac{1}{\sqrt{T}})$ rate. We also justify the improved generalization of both Lookahead over SGD, and of Multilayer Lookahead over Lookahead, by showing how they amplify the implicit regularization effect of SGD. We empirically verify our results and show that Multilayer Lookahead outperforms Lookahead on CIFAR-10 and CIFAR-100 classification tasks, and on GANs training on the MNIST dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes