LG AI CLApr 8, 2022

Grounding Hindsight Instructions in Multi-Goal Reinforcement Learning for Robotics

Frank Röder, Manfred Eppe, Stefan Wermter

arXiv:2204.04308v25.87 citationsh-index: 46Has Code

Originality Incremental advance

AI Analysis

This addresses sample inefficiency for robotics using natural language goals, but it is incremental as it builds on existing hindsight methods.

The paper tackles sample inefficiency in robotic reinforcement learning with sparse rewards and natural language goals by introducing hindsight instruction replay and a seq2seq model for generating linguistic instructions, showing that self-supervised instruction generation improves learning performance by one third, with gains increasing with task complexity.

This paper focuses on robotic reinforcement learning with sparse rewards for natural language goal representations. An open problem is the sample-inefficiency that stems from the compositionality of natural language, and from the grounding of language in sensory data and actions. We address these issues with three contributions. We first present a mechanism for hindsight instruction replay utilizing expert feedback. Second, we propose a seq2seq model to generate linguistic hindsight instructions. Finally, we present a novel class of language-focused learning tasks. We show that hindsight instructions improve the learning performance, as expected. In addition, we also provide an unexpected result: We show that the learning performance of our agent can be improved by one third if, in a sense, the agent learns to talk to itself in a self-supervised manner. We achieve this by learning to generate linguistic instructions that would have been appropriate as a natural language goal for an originally unintended behavior. Our results indicate that the performance gain increases with the task-complexity.

View on arXiv PDF Code

Similar