MLAILGOCMay 17, 2018

Interpolatron: Interpolation or Extrapolation Schemes to Accelerate Optimization for Deep Neural Networks

arXiv:1805.06753v13 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of slow convergence in nonconvex optimization for deep learning practitioners, though it appears incremental as it builds on classical extrapolation schemes.

The paper tackles the problem of accelerating optimization for deep neural networks by proposing an interpolation scheme called Interpolatron, which converges much faster than state-of-the-art methods like SGD with momentum and Adam on datasets such as CIFAR-10 and ImageNet with deep ResNets.

In this paper we explore acceleration techniques for large scale nonconvex optimization problems with special focuses on deep neural networks. The extrapolation scheme is a classical approach for accelerating stochastic gradient descent for convex optimization, but it does not work well for nonconvex optimization typically. Alternatively, we propose an interpolation scheme to accelerate nonconvex optimization and call the method Interpolatron. We explain motivation behind Interpolatron and conduct a thorough empirical analysis. Empirical results on DNNs of great depths (e.g., 98-layer ResNet and 200-layer ResNet) on CIFAR-10 and ImageNet show that Interpolatron can converge much faster than the state-of-the-art methods such as the SGD with momentum and Adam. Furthermore, Anderson's acceleration, in which mixing coefficients are computed by least-squares estimation, can also be used to improve the performance. Both Interpolatron and Anderson's acceleration are easy to implement and tune. We also show that Interpolatron has linear convergence rate under certain regularity assumptions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes