A Parallel SGD method with Strong Convergence
This addresses the need for efficient optimization in machine learning, but appears incremental as it builds on existing SGD and batch descent techniques.
The paper tackled the problem of improving stochastic gradient descent (SGD) by proposing a novel parallel method that combines parallel SGD iterations with batch descent, resulting in strong convergence properties and demonstrated value on high-dimensional datasets.
This paper proposes a novel parallel stochastic gradient descent (SGD) method that is obtained by applying parallel sets of SGD iterations (each set operating on one node using the data residing in it) for finding the direction in each iteration of a batch descent method. The method has strong convergence properties. Experiments on datasets with high dimensional feature spaces show the value of this method.