ML LGNov 6, 2018

Double Adaptive Stochastic Gradient Optimization

Kin Gutierrez, Jin Li, Cristian Challu, Artur Dubrawski

arXiv:1811.02525v11.0

Originality Incremental advance

AI Analysis

This work addresses optimization challenges in deep learning for researchers and practitioners, but it is incremental as it builds on existing adaptive moment and probabilistic methods.

The paper tackled the problem of optimizing deep learning models with noisy or sparse gradients by proposing DASGrad, a family of double adaptive stochastic gradient methods, which showed improved convergence in theoretical analysis and empirical validation, with benefits increasing with model complexity and gradient variability.

Adaptive moment methods have been remarkably successful in deep learning optimization, particularly in the presence of noisy and/or sparse gradients. We further the advantages of adaptive moment techniques by proposing a family of double adaptive stochastic gradient methods~\textsc{DASGrad}. They leverage the complementary ideas of the adaptive moment algorithms widely used by deep learning community, and recent advances in adaptive probabilistic algorithms.We analyze the theoretical convergence improvements of our approach in a stochastic convex optimization setting, and provide empirical validation of our findings with convex and non convex objectives. We observe that the benefits of~\textsc{DASGrad} increase with the model complexity and variability of the gradients, and we explore the resulting utility in extensions of distribution-matching multitask learning.

View on arXiv PDF

Similar