LG AINov 14, 2022

Redeeming Intrinsic Rewards via Constrained Optimization

Eric Chen, Zhang-Wei Hong, Joni Pajarinen, Pulkit Agrawal

arXiv:2211.07627v218.141 citationsh-index: 40Has Code

Originality Incremental advance

AI Analysis

This addresses a key bottleneck for RL practitioners by enabling reliable use of intrinsic rewards without manual tuning, though it is incremental as it builds on prior intrinsic reward methods.

The paper tackles the problem of inconsistent performance in reinforcement learning when using intrinsic rewards for exploration, proposing a constrained optimization method that automatically adjusts the importance of intrinsic rewards, resulting in consistent performance gains across 61 ATARI games.

State-of-the-art reinforcement learning (RL) algorithms typically use random sampling (e.g., $ε$-greedy) for exploration, but this method fails on hard exploration tasks like Montezuma's Revenge. To address the challenge of exploration, prior works incentivize exploration by rewarding the agent when it visits novel states. Such intrinsic rewards (also called exploration bonus or curiosity) often lead to excellent performance on hard exploration tasks. However, on easy exploration tasks, the agent gets distracted by intrinsic rewards and performs unnecessary exploration even when sufficient task (also called extrinsic) reward is available. Consequently, such an overly curious agent performs worse than an agent trained with only task reward. Such inconsistency in performance across tasks prevents the widespread use of intrinsic rewards with RL algorithms. We propose a principled constrained optimization procedure called Extrinsic-Intrinsic Policy Optimization (EIPO) that automatically tunes the importance of the intrinsic reward: it suppresses the intrinsic reward when exploration is unnecessary and increases it when exploration is required. The results is superior exploration that does not require manual tuning in balancing the intrinsic reward against the task reward. Consistent performance gains across sixty-one ATARI games validate our claim. The code is available at https://github.com/Improbable-AI/eipo.

View on arXiv PDF Code

Similar