Evolving Reinforcement Learning Algorithms
This work addresses the problem of automatically designing more effective and generalizable reinforcement learning algorithms for the machine learning community.
This paper proposes a meta-learning method that searches over computational graphs to evolve reinforcement learning algorithms. It successfully rediscovers the temporal-difference algorithm from scratch and, when bootstrapped from DQN, learns algorithms that generalize well across classical control, gridworld, and Atari tasks, showing resemblance to methods addressing overestimation.
We propose a method for meta-learning reinforcement learning algorithms by searching over the space of computational graphs which compute the loss function for a value-based model-free RL agent to optimize. The learned algorithms are domain-agnostic and can generalize to new environments not seen during training. Our method can both learn from scratch and bootstrap off known existing algorithms, like DQN, enabling interpretable modifications which improve performance. Learning from scratch on simple classical control and gridworld tasks, our method rediscovers the temporal-difference (TD) algorithm. Bootstrapped from DQN, we highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games. The analysis of the learned algorithm behavior shows resemblance to recently proposed RL algorithms that address overestimation in value-based methods.