OC LGFeb 12, 2016

Using Deep Q-Learning to Control Optimization Hyperparameters

arXiv:1602.04062v215.344 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of hyperparameter tuning in optimization, particularly for neural networks, by introducing a reinforcement learning-based approach, though it appears incremental as it builds on existing Q-learning methods.

The authors tackled the problem of controlling optimization hyperparameters, specifically the learning rate, by using deep Q-learning to adjust it dynamically, resulting in Q-gradient descent algorithms that outperform gradient descent with Armijo or nonmonotone line searches.

We present a novel definition of the reinforcement learning state, actions and reward function that allows a deep Q-network (DQN) to learn to control an optimization hyperparameter. Using Q-learning with experience replay, we train two DQNs to accept a state representation of an objective function as input and output the expected discounted return of rewards, or q-values, connected to the actions of either adjusting the learning rate or leaving it unchanged. The two DQNs learn a policy similar to a line search, but differ in the number of allowed actions. The trained DQNs in combination with a gradient-based update routine form the basis of the Q-gradient descent algorithms. To demonstrate the viability of this framework, we show that the DQN's q-values associated with optimal action converge and that the Q-gradient descent algorithms outperform gradient descent with an Armijo or nonmonotone line search. Unlike traditional optimization methods, Q-gradient descent can incorporate any objective statistic and by varying the actions we gain insight into the type of learning rate adjustment strategies that are successful for neural network optimization.

View on arXiv PDF

Similar