Hierarchical Decision Transformer
This work addresses the challenge of learning effective policies from demonstrations in reinforcement learning, particularly for tasks with long horizons and sparse rewards, representing an incremental improvement over existing sequence model methods.
The paper tackles the problem of learning from demonstrations with sequence models in reinforcement learning by introducing a hierarchical algorithm that uses high-level sub-goal selection to guide a low-level controller, replacing returns-to-go and improving performance, especially in tasks with longer episodes and scarcer rewards, as validated by outperforming baselines in eight out of ten tasks across OpenAIGym, D4RL, and RoboMimic benchmarks.
Sequence models in reinforcement learning require task knowledge to estimate the task policy. This paper presents a hierarchical algorithm for learning a sequence model from demonstrations. The high-level mechanism guides the low-level controller through the task by selecting sub-goals for the latter to reach. This sequence replaces the returns-to-go of previous methods, improving its performance overall, especially in tasks with longer episodes and scarcer rewards. We validate our method in multiple tasks of OpenAIGym, D4RL and RoboMimic benchmarks. Our method outperforms the baselines in eight out of ten tasks of varied horizons and reward frequencies without prior task knowledge, showing the advantages of the hierarchical model approach for learning from demonstrations using a sequence model.