LGOCMar 22, 2016

Doubly Random Parallel Stochastic Methods for Large Scale Learning

arXiv:1603.06782v12 citations
Originality Incremental advance
AI Analysis

This work addresses scalability issues in machine learning for practitioners dealing with massive datasets and high dimensions, though it is incremental as it builds on existing parallel and stochastic methods.

The paper tackles the problem of large-scale learning with high-dimensional feature vectors and many training examples by proposing RAPSA, a doubly random parallel stochastic algorithm that converges to the optimal classifier for convex objectives, achieving linear convergence rates in expectation with constant stepsizes.

We consider learning problems over training sets in which both, the number of training examples and the dimension of the feature vectors, are large. To solve these problems we propose the random parallel stochastic algorithm (RAPSA). We call the algorithm random parallel because it utilizes multiple processors to operate in a randomly chosen subset of blocks of the feature vector. We call the algorithm parallel stochastic because processors choose elements of the training set randomly and independently. Algorithms that are parallel in either of these dimensions exist, but RAPSA is the first attempt at a methodology that is parallel in both, the selection of blocks and the selection of elements of the training set. In RAPSA, processors utilize the randomly chosen functions to compute the stochastic gradient component associated with a randomly chosen block. The technical contribution of this paper is to show that this minimally coordinated algorithm converges to the optimal classifier when the training objective is convex. In particular, we show that: (i) When using decreasing stepsizes, RAPSA converges almost surely over the random choice of blocks and functions. (ii) When using constant stepsizes, convergence is to a neighborhood of optimality with a rate that is linear in expectation. RAPSA is numerically evaluated on the MNIST digit recognition problem.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes