Sample Efficient Reinforcement Learning by Automatically Learning to Compose Subtasks
This addresses sample efficiency for RL practitioners in sparse-reward environments, offering an incremental advance over existing methods.
The paper tackles the problem of sample inefficiency in reinforcement learning with sparse rewards by proposing an algorithm that automatically structures reward functions using subtask labels. The result is a significant performance improvement over state-of-the-art baselines, especially as task difficulty increases.
Improving sample efficiency is central to Reinforcement Learning (RL), especially in environments where the rewards are sparse. Some recent approaches have proposed to specify reward functions as manually designed or learned reward structures whose integrations in the RL algorithms are claimed to significantly improve the learning efficiency. Manually designed reward structures can suffer from inaccuracy and existing automatically learning methods are often computationally intractable for complex tasks. The integration of inaccurate or partial reward structures in RL algorithms fail to learn optimal policies. In this work, we propose an RL algorithm that can automatically structure the reward function for sample efficiency, given a set of labels that signify subtasks. Given such minimal knowledge about the task, we train a high-level policy that selects optimal sub-tasks in each state together with a low-level policy that efficiently learns to complete each sub-task. We evaluate our algorithm in a variety of sparse-reward environments. The experiment results show that our approach significantly outperforms the state-of-art baselines as the difficulty of the task increases.