LG AIAug 6, 2024

Highly Efficient Self-Adaptive Reward Shaping for Reinforcement Learning

Haozhe Ma, Zhengding Luo, Thanh Vinh Vo, Kuankuan Sima, Tze-Yun Leong

arXiv:2408.03029v417.622 citationsh-index: 24

Originality Incremental advance

AI Analysis

This addresses the sparse-reward issue in reinforcement learning for domains with high-dimensional continuous state spaces, but it is incremental as it builds on existing reward shaping techniques.

The paper tackles the sparse-reward problem in reinforcement learning by introducing a self-adaptive reward shaping mechanism that uses success rates from historical experiences, resulting in improved sample efficiency and convergence stability over baselines.

Reward shaping is a technique in reinforcement learning that addresses the sparse-reward problem by providing more frequent and informative rewards. We introduce a self-adaptive and highly efficient reward shaping mechanism that incorporates success rates derived from historical experiences as shaped rewards. The success rates are sampled from Beta distributions, which dynamically evolve from uncertain to reliable values as data accumulates. Initially, the shaped rewards exhibit more randomness to encourage exploration, while over time, the increasing certainty enhances exploitation, naturally balancing exploration and exploitation. Our approach employs Kernel Density Estimation (KDE) combined with Random Fourier Features (RFF) to derive the Beta distributions, providing a computationally efficient, non-parametric, and learning-free solution for high-dimensional continuous state spaces. Our method is validated on various tasks with extremely sparse rewards, demonstrating notable improvements in sample efficiency and convergence stability over relevant baselines.

View on arXiv PDF

Similar