LGOCSep 22, 2016

Decoupled Asynchronous Proximal Stochastic Gradient Descent with Variance Reduction

arXiv:1609.06804v28 citations
AI Analysis

This work addresses convergence bottlenecks in asynchronous optimization for large-scale machine learning, offering an incremental improvement over existing methods.

The paper tackles the slow convergence of decoupled asynchronous proximal stochastic gradient descent (DAP-SGD) due to non-zero variance by proposing DAP-SVRG, a method that achieves linear convergence for strongly convex problems, as validated through large-scale experiments.

In the era of big data, optimizing large scale machine learning problems becomes a challenging task and draws significant attention. Asynchronous optimization algorithms come out as a promising solution. Recently, decoupled asynchronous proximal stochastic gradient descent (DAP-SGD) is proposed to minimize a composite function. It is claimed to be able to off-loads the computation bottleneck from server to workers by allowing workers to evaluate the proximal operators, therefore, server just need to do element-wise operations. However, it still suffers from slow convergence rate because of the variance of stochastic gradient is nonzero. In this paper, we propose a faster method, decoupled asynchronous proximal stochastic variance reduced gradient descent method (DAP-SVRG). We prove that our method has linear convergence for strongly convex problem. Large-scale experiments are also conducted in this paper, and results demonstrate our theoretical analysis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes