LGAIOct 8, 2020

Learning Intrinsic Symbolic Rewards in Reinforcement Learning

arXiv:2010.03694v29 citations
Originality Incremental advance
AI Analysis

This addresses the problem of sparse reward learning in RL for researchers and practitioners by providing a more interpretable and effective method, though it is incremental as it builds on existing reward-discovery approaches.

The paper tackles the challenge of learning effective policies for sparse objectives in reinforcement learning by discovering dense rewards in the form of low-dimensional symbolic trees, which are more interpretable than neural network-based methods, and it significantly outperforms a contemporary neural-network reward-discovery algorithm across various environments.

Learning effective policies for sparse objectives is a key challenge in Deep Reinforcement Learning (RL). A common approach is to design task-related dense rewards to improve task learnability. While such rewards are easily interpreted, they rely on heuristics and domain expertise. Alternate approaches that train neural networks to discover dense surrogate rewards avoid heuristics, but are high-dimensional, black-box solutions offering little interpretability. In this paper, we present a method that discovers dense rewards in the form of low-dimensional symbolic trees - thus making them more tractable for analysis. The trees use simple functional operators to map an agent's observations to a scalar reward, which then supervises the policy gradient learning of a neural network policy. We test our method on continuous action spaces in Mujoco and discrete action spaces in Atari and Pygame environments. We show that the discovered dense rewards are an effective signal for an RL policy to solve the benchmark tasks. Notably, we significantly outperform a widely used, contemporary neural-network based reward-discovery algorithm in all environments considered.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes