LG AISep 15, 2022

Optimistic Curiosity Exploration and Conservative Exploitation with Linear Reward Shaping

Hao Sun, Lei Han, Rui Yang, Xiaoteng Ma, Jian Guo, Bolei Zhou

Cambridge

arXiv:2209.07288v27.812 citationsh-index: 72Has Code

Originality Incremental advance

AI Analysis

This work addresses the exploration-exploitation dilemma in reinforcement learning for researchers and practitioners, offering a simple, incremental method applicable to various RL tasks.

The paper tackles reward shaping in deep reinforcement learning by showing that linear reward shifting is equivalent to changing Q-function initialization, leading to conservative exploitation for offline RL and curiosity-driven exploration for online RL. Results include improved performance in offline RL, better sample efficiency in online continuous control, and enhanced exploration in discrete control tasks over baselines.

In this work, we study the simple yet universally applicable case of reward shaping in value-based Deep Reinforcement Learning (DRL). We show that reward shifting in the form of the linear transformation is equivalent to changing the initialization of the $Q$-function in function approximation. Based on such an equivalence, we bring the key insight that a positive reward shifting leads to conservative exploitation, while a negative reward shifting leads to curiosity-driven exploration. Accordingly, conservative exploitation improves offline RL value estimation, and optimistic value estimation improves exploration for online RL. We validate our insight on a range of RL tasks and show its improvement over baselines: (1) In offline RL, the conservative exploitation leads to improved performance based on off-the-shelf algorithms; (2) In online continuous control, multiple value functions with different shifting constants can be used to tackle the exploration-exploitation dilemma for better sample efficiency; (3) In discrete control tasks, a negative reward shifting yields an improvement over the curiosity-based exploration method.

View on arXiv PDF Code

Similar