LG AI HC ROOct 26, 2022

D-Shape: Demonstration-Shaped Reinforcement Learning via Goal Conditioning

Caroline Wang, Garrett Warnell, Peter Stone

arXiv:2210.14428v23.34 citationsh-index: 94

Originality Incremental advance

AI Analysis

This addresses a fundamental challenge in autonomous behavior acquisition for robotics or AI systems, though it is incremental as it builds on existing IL and RL methods.

The paper tackles the conflict between imitation learning and reinforcement learning when using suboptimal demonstrations, introducing D-Shape to resolve this and enable learning from such demonstrations while finding the optimal policy. It shows improved sample efficiency over RL and consistent convergence to the optimal policy in sparse-reward gridworld domains.

While combining imitation learning (IL) and reinforcement learning (RL) is a promising way to address poor sample efficiency in autonomous behavior acquisition, methods that do so typically assume that the requisite behavior demonstrations are provided by an expert that behaves optimally with respect to a task reward. If, however, suboptimal demonstrations are provided, a fundamental challenge appears in that the demonstration-matching objective of IL conflicts with the return-maximization objective of RL. This paper introduces D-Shape, a new method for combining IL and RL that uses ideas from reward shaping and goal-conditioned RL to resolve the above conflict. D-Shape allows learning from suboptimal demonstrations while retaining the ability to find the optimal policy with respect to the task reward. We experimentally validate D-Shape in sparse-reward gridworld domains, showing that it both improves over RL in terms of sample efficiency and converges consistently to the optimal policy in the presence of suboptimal demonstrations.

View on arXiv PDF

Similar