MLLGNov 11, 2016

Learning to Learn without Gradient Descent by Gradient Descent

arXiv:1611.03824v6121 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficient optimization for a broad range of applications, including hyper-parameter tuning, but is incremental as it builds on existing gradient descent and neural network methods.

The paper tackled the problem of optimizing derivative-free black-box functions by training recurrent neural network optimizers on synthetic functions, achieving performance comparable to engineered Bayesian optimization packages in hyper-parameter tuning tasks.

We learn recurrent neural network optimizers trained on simple synthetic functions by gradient descent. We show that these learned optimizers exhibit a remarkable degree of transfer in that they can be used to efficiently optimize a broad range of derivative-free black-box functions, including Gaussian process bandits, simple control objectives, global optimization benchmarks and hyper-parameter tuning tasks. Up to the training horizon, the learned optimizers learn to trade-off exploration and exploitation, and compare favourably with heavily engineered Bayesian optimization packages for hyper-parameter tuning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes