LGSep 15, 2017

Accelerating SGD for Distributed Deep-Learning Using Approximated Hessian Matrix

arXiv:1709.05069v1
Originality Incremental advance
AI Analysis

This work addresses the challenge of improving optimization efficiency in distributed deep learning, though it appears incremental as it builds on existing second-order methods with preliminary results.

The paper tackles the problem of accelerating stochastic gradient descent for distributed deep learning by introducing a novel method to compute a rank m approximation of the inverse Hessian matrix, leveraging gradient and parameter differences across workers to implement a distributed Newton-Raphson approximation, with preliminary results highlighting advantages and challenges of second-order methods in large stochastic optimization.

We introduce a novel method to compute a rank $m$ approximation of the inverse of the Hessian matrix in the distributed regime. By leveraging the differences in gradients and parameters of multiple Workers, we are able to efficiently implement a distributed approximation of the Newton-Raphson method. We also present preliminary results which underline advantages and challenges of second-order methods for large stochastic optimization problems. In particular, our work suggests that novel strategies for combining gradients provide further information on the loss surface.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes