LG AI RODec 5, 2019

Training Agents using Upside-Down Reinforcement Learning

Rupesh Kumar Srivastava, Pranav Shyam, Filipe Mutz, Wojciech Jaśkowski, Jürgen Schmidhuber

arXiv:1912.02877v226.7140 citations

Originality Incremental advance

AI Analysis

This offers an alternative approach to expected reward maximization for training autonomous agents, though it appears incremental as it builds on supervised learning techniques rather than introducing a new paradigm.

The paper tackles the problem of training agents in reinforcement learning by introducing Upside-Down Reinforcement Learning (UDRL), which uses only supervised learning to follow commands like achieving specific rewards in given times, and shows that it can be competitive with or exceed traditional baseline algorithms in some episodic environments.

We develop Upside-Down Reinforcement Learning (UDRL), a method for learning to act using only supervised learning techniques. Unlike traditional algorithms, UDRL does not use reward prediction or search for an optimal policy. Instead, it trains agents to follow commands such as "obtain so much total reward in so much time." Many of its general principles are outlined in a companion report; the goal of this paper is to develop a practical learning algorithm and show that this conceptually simple perspective on agent training can produce a range of rewarding behaviors for multiple episodic environments. Experiments show that on some tasks UDRL's performance can be surprisingly competitive with, and even exceed that of some traditional baseline algorithms developed over decades of research. Based on these results, we suggest that alternative approaches to expected reward maximization have an important role to play in training useful autonomous agents.

View on arXiv PDF

Similar