LGFeb 24, 2021

Synthetic Returns for Long-Term Credit Assignment

David Raposo, Sam Ritter, Adam Santoro, Greg Wayne, Theophane Weber, Matt Botvinick, Hado van Hasselt, Francis Song

arXiv:2102.12425v117.236 citationsh-index: 31

Originality Incremental advance

AI Analysis

This addresses a fundamental bottleneck in reinforcement learning for tasks with delayed rewards, though it is an incremental improvement over existing methods.

The paper tackles the problem of long-term credit assignment in reinforcement learning by introducing state-associative learning to model contributions of past states to future rewards, enabling faster learning; it demonstrates a 25 times speedup in solving Atari Skiing compared to the state-of-the-art.

Since the earliest days of reinforcement learning, the workhorse method for assigning credit to actions over time has been temporal-difference (TD) learning, which propagates credit backward timestep-by-timestep. This approach suffers when delays between actions and rewards are long and when intervening unrelated events contribute variance to long-term returns. We propose state-associative (SA) learning, where the agent learns associations between states and arbitrarily distant future rewards, then propagates credit directly between the two. In this work, we use SA-learning to model the contribution of past states to the current reward. With this model we can predict each state's contribution to the far future, a quantity we call "synthetic returns". TD-learning can then be applied to select actions that maximize these synthetic returns (SRs). We demonstrate the effectiveness of augmenting agents with SRs across a range of tasks on which TD-learning alone fails. We show that the learned SRs are interpretable: they spike for states that occur after critical actions are taken. Finally, we show that our IMPALA-based SR agent solves Atari Skiing -- a game with a lengthy reward delay that posed a major hurdle to deep-RL agents -- 25 times faster than the published state-of-the-art.

View on arXiv PDF

Similar