LGSep 15, 2016

An overview of gradient descent optimization algorithms

arXiv:1609.04747v26922 citations
AI Analysis

This is an incremental overview aimed at practitioners and researchers in machine learning to improve understanding and application of existing optimization methods.

The paper tackles the problem of gradient descent optimization algorithms being used as black-box optimizers by providing intuitive explanations of their behaviors, strengths, and weaknesses, without presenting new experimental results or concrete numbers.

Gradient descent optimization algorithms, while increasingly popular, are often used as black-box optimizers, as practical explanations of their strengths and weaknesses are hard to come by. This article aims to provide the reader with intuitions with regard to the behaviour of different algorithms that will allow her to put them to use. In the course of this overview, we look at different variants of gradient descent, summarize challenges, introduce the most common optimization algorithms, review architectures in a parallel and distributed setting, and investigate additional strategies for optimizing gradient descent.

Code Implementations21 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes