Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement Learning
This work addresses exploration challenges in reinforcement learning for AI agents, but it is incremental as it builds on existing intrinsic reward methods with adaptive selection.
The paper tackles the problem of exploration in deep reinforcement learning by introducing AIRS, an automatic intrinsic reward shaping method that adaptively selects shaping functions based on estimated task return, resulting in superior performance on tasks from MiniGrid, Procgen, and DeepMind Control Suite compared to benchmarking schemes.
We present AIRS: Automatic Intrinsic Reward Shaping that intelligently and adaptively provides high-quality intrinsic rewards to enhance exploration in reinforcement learning (RL). More specifically, AIRS selects shaping function from a predefined set based on the estimated task return in real-time, providing reliable exploration incentives and alleviating the biased objective problem. Moreover, we develop an intrinsic reward toolkit to provide efficient and reliable implementations of diverse intrinsic reward approaches. We test AIRS on various tasks of MiniGrid, Procgen, and DeepMind Control Suite. Extensive simulation demonstrates that AIRS can outperform the benchmarking schemes and achieve superior performance with simple architecture.