LGAIMLSep 29, 2020

Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution

arXiv:2009.14108v246 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the challenge of sample inefficiency in reinforcement learning for complex tasks, particularly when only few high-reward demonstrations are available, representing an incremental improvement over existing methods.

The paper tackles the problem of reinforcement learning requiring many samples for complex hierarchical tasks with sparse rewards by introducing Align-RUDDER, which uses a profile model from multiple sequence alignment of demonstrations for reward redistribution, resulting in improved learning from few demonstrations and outperforming competitors on artificial tasks, though it mines diamonds infrequently in Minecraft.

Reinforcement learning algorithms require many samples when solving complex hierarchical tasks with sparse and delayed rewards. For such complex tasks, the recently proposed RUDDER uses reward redistribution to leverage steps in the Q-function that are associated with accomplishing sub-tasks. However, often only few episodes with high rewards are available as demonstrations since current exploration strategies cannot discover them in reasonable time. In this work, we introduce Align-RUDDER, which utilizes a profile model for reward redistribution that is obtained from multiple sequence alignment of demonstrations. Consequently, Align-RUDDER employs reward redistribution effectively and, thereby, drastically improves learning on few demonstrations. Align-RUDDER outperforms competitors on complex artificial tasks with delayed rewards and few demonstrations. On the Minecraft ObtainDiamond task, Align-RUDDER is able to mine a diamond, though not frequently. Code is available at https://github.com/ml-jku/align-rudder. YouTube: https://youtu.be/HO-_8ZUl-UY

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes