Hierarchical Potential-based Reward Shaping from Task Specifications
This work addresses the challenge of designing reward signals for reinforcement learning in robotics, particularly for autonomous driving, but it appears incremental as it builds upon existing potential-based reward shaping.
The paper tackles the problem of automatically synthesizing policies for robotic-control tasks by introducing a hierarchical, potential-based reward-shaping approach (HPRS) that defines effective multivariate rewards from task specifications, resulting in task-satisfying policies with improved comfort and faster convergence to optimal behavior compared to state-of-the-art methods.
The automatic synthesis of policies for robotic-control tasks through reinforcement learning relies on a reward signal that simultaneously captures many possibly conflicting requirements. In this paper, we in\-tro\-duce a novel, hierarchical, potential-based reward-shaping approach (HPRS) for defining effective, multivariate rewards for a large family of such control tasks. We formalize a task as a partially-ordered set of safety, target, and comfort requirements, and define an automated methodology to enforce a natural order among requirements and shape the associated reward. Building upon potential-based reward shaping, we show that HPRS preserves policy optimality. Our experimental evaluation demonstrates HPRS's superior ability in capturing the intended behavior, resulting in task-satisfying policies with improved comfort, and converging to optimal behavior faster than other state-of-the-art approaches. We demonstrate the practical usability of HPRS on several robotics applications and the smooth sim2real transition on two autonomous-driving scenarios for F1TENTH race cars.