LG AIJul 19, 2021

Decoupled Reinforcement Learning to Stabilise Intrinsically-Motivated Exploration

Lukas Schäfer, Filippos Christianos, Josiah P. Hanna, Stefano V. Albrecht

arXiv:2107.08966v315.129 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses a problem for reinforcement learning practitioners by providing a more stable and efficient method for exploration in sparse-reward environments, though it is incremental as it builds on existing intrinsic reward techniques.

The paper tackles the instability and hyperparameter sensitivity of intrinsic rewards in reinforcement learning by introducing Decoupled RL (DeRL), a framework that trains separate policies for exploration and exploitation, resulting in improved robustness and sample efficiency with fewer interactions needed for convergence.

Intrinsic rewards can improve exploration in reinforcement learning, but the exploration process may suffer from instability caused by non-stationary reward shaping and strong dependency on hyperparameters. In this work, we introduce Decoupled RL (DeRL) as a general framework which trains separate policies for intrinsically-motivated exploration and exploitation. Such decoupling allows DeRL to leverage the benefits of intrinsic rewards for exploration while demonstrating improved robustness and sample efficiency. We evaluate DeRL algorithms in two sparse-reward environments with multiple types of intrinsic rewards. Our results show that DeRL is more robust to varying scale and rate of decay of intrinsic rewards and converges to the same evaluation returns than intrinsically-motivated baselines in fewer interactions. Lastly, we discuss the challenge of distribution shift and show that divergence constraint regularisers can successfully minimise instability caused by divergence of exploration and exploitation policies.

View on arXiv PDF Code

Similar