LGAIMLJan 16, 2013

Adaptive learning rates and parallelization for stochastic, sparse, non-smooth gradients

arXiv:1301.3764v230 citations
Originality Incremental advance
AI Analysis

This work addresses the need for efficient and robust optimization in machine learning, though it is incremental as it builds on prior adaptive learning rate methods.

The paper tackles the problem of automating learning rate tuning and improving robustness in stochastic gradient descent by extending an existing adaptive framework to handle minibatch parallelization, sparse gradients, and non-smooth loss functions, resulting in a hyper-parameter-free algorithm with linear complexity.

Recent work has established an empirically successful framework for adapting learning rates for stochastic gradient descent (SGD). This effectively removes all needs for tuning, while automatically reducing learning rates over time on stationary problems, and permitting learning rates to grow appropriately in non-stationary tasks. Here, we extend the idea in three directions, addressing proper minibatch parallelization, including reweighted updates for sparse or orthogonal gradients, improving robustness on non-smooth loss functions, in the process replacing the diagonal Hessian estimation procedure that may not always be available by a robust finite-difference approximation. The final algorithm integrates all these components, has linear complexity and is hyper-parameter free.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes