LGNEMLOct 16, 2018

Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks

arXiv:1810.06773v1101 citations
Originality Incremental advance
AI Analysis

This work addresses optimization challenges for deep learning practitioners, presenting an incremental hybrid method.

The paper tackles the optimization of deep neural networks by proposing Evolutionary Stochastic Gradient Descent (ESGD), a framework that alternates between SGD and evolutionary algorithms to improve population fitness, and demonstrates its effectiveness in speech recognition, image recognition, and language modeling with various deep architectures.

We propose a population-based Evolutionary Stochastic Gradient Descent (ESGD) framework for optimizing deep neural networks. ESGD combines SGD and gradient-free evolutionary algorithms as complementary algorithms in one framework in which the optimization alternates between the SGD step and evolution step to improve the average fitness of the population. With a back-off strategy in the SGD step and an elitist strategy in the evolution step, it guarantees that the best fitness in the population will never degrade. In addition, individuals in the population optimized with various SGD-based optimizers using distinct hyper-parameters in the SGD step are considered as competing species in a coevolution setting such that the complementarity of the optimizers is also taken into account. The effectiveness of ESGD is demonstrated across multiple applications including speech recognition, image recognition and language modeling, using networks with a variety of deep architectures.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes