Using Deep Q-Learning to Control Optimization Hyperparameters
This work addresses the challenge of hyperparameter tuning in optimization, particularly for neural networks, by introducing a reinforcement learning-based approach, though it appears incremental as it builds on existing Q-learning methods.
The authors tackled the problem of controlling optimization hyperparameters, specifically the learning rate, by using deep Q-learning to adjust it dynamically, resulting in Q-gradient descent algorithms that outperform gradient descent with Armijo or nonmonotone line searches.
We present a novel definition of the reinforcement learning state, actions and reward function that allows a deep Q-network (DQN) to learn to control an optimization hyperparameter. Using Q-learning with experience replay, we train two DQNs to accept a state representation of an objective function as input and output the expected discounted return of rewards, or q-values, connected to the actions of either adjusting the learning rate or leaving it unchanged. The two DQNs learn a policy similar to a line search, but differ in the number of allowed actions. The trained DQNs in combination with a gradient-based update routine form the basis of the Q-gradient descent algorithms. To demonstrate the viability of this framework, we show that the DQN's q-values associated with optimal action converge and that the Q-gradient descent algorithms outperform gradient descent with an Armijo or nonmonotone line search. Unlike traditional optimization methods, Q-gradient descent can incorporate any objective statistic and by varying the actions we gain insight into the type of learning rate adjustment strategies that are successful for neural network optimization.