LGAIAug 6, 2024

Highly Efficient Self-Adaptive Reward Shaping for Reinforcement Learning

arXiv:2408.03029v422 citationsh-index: 8
AI Analysis

This addresses the sparse-reward issue in reinforcement learning for domains with high-dimensional continuous state spaces, but it is incremental as it builds on existing reward shaping techniques.

The paper tackles the sparse-reward problem in reinforcement learning by introducing a self-adaptive reward shaping mechanism that uses success rates from historical experiences, resulting in improved sample efficiency and convergence stability over baselines.

Reward shaping is a technique in reinforcement learning that addresses the sparse-reward problem by providing more frequent and informative rewards. We introduce a self-adaptive and highly efficient reward shaping mechanism that incorporates success rates derived from historical experiences as shaped rewards. The success rates are sampled from Beta distributions, which dynamically evolve from uncertain to reliable values as data accumulates. Initially, the shaped rewards exhibit more randomness to encourage exploration, while over time, the increasing certainty enhances exploitation, naturally balancing exploration and exploitation. Our approach employs Kernel Density Estimation (KDE) combined with Random Fourier Features (RFF) to derive the Beta distributions, providing a computationally efficient, non-parametric, and learning-free solution for high-dimensional continuous state spaces. Our method is validated on various tasks with extremely sparse rewards, demonstrating notable improvements in sample efficiency and convergence stability over relevant baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes