LGJun 14, 2023
Skill-Critic: Refining Learned Skills for Hierarchical Reinforcement LearningCe Hao, Catherine Weaver, Chen Tang et al.
Hierarchical reinforcement learning (RL) can accelerate long-horizon decision-making by temporally abstracting a policy into multiple levels. Promising results in sparse reward environments have been seen with skills, i.e. sequences of primitive actions. Typically, a skill latent space and policy are discovered from offline data. However, the resulting low-level policy can be unreliable due to low-coverage demonstrations or distribution shifts. As a solution, we propose the Skill-Critic algorithm to fine-tune the low-level policy in conjunction with high-level skill selection. Our Skill-Critic algorithm optimizes both the low-level and high-level policies; these policies are initialized and regularized by the latent space learned from offline demonstrations to guide the parallel policy optimization. We validate Skill-Critic in multiple sparse-reward RL environments, including a new sparse-reward autonomous racing task in Gran Turismo Sport. The experiments show that Skill-Critic's low-level policy fine-tuning and demonstration-guided regularization are essential for good performance. Code and videos are available at our website: https://sites.google.com/view/skill-critic.
LGFeb 22, 2024Code
BeTAIL: Behavior Transformer Adversarial Imitation Learning from Human Racing GameplayCatherine Weaver, Chen Tang, Ce Hao et al.
Imitation learning learns a policy from demonstrations without requiring hand-designed reward functions. In many robotic tasks, such as autonomous racing, imitated policies must model complex environment dynamics and human decision-making. Sequence modeling is highly effective in capturing intricate patterns of motion sequences but struggles to adapt to new environments or distribution shifts that are common in real-world robotics tasks. In contrast, Adversarial Imitation Learning (AIL) can mitigate this effect, but struggles with sample inefficiency and handling complex motion patterns. Thus, we propose BeTAIL: Behavior Transformer Adversarial Imitation Learning, which combines a Behavior Transformer (BeT) policy from human demonstrations with online AIL. BeTAIL adds an AIL residual policy to the BeT policy to model the sequential decision-making process of human experts and correct for out-of-distribution states or shifts in environment dynamics. We test BeTAIL on three challenges with expert-level demonstrations of real human gameplay in Gran Turismo Sport. Our proposed residual BeTAIL reduces environment interactions and improves racing performance and stability, even when the BeT is pretrained on different tracks than downstream learning. Videos and code available at: https://sites.google.com/berkeley.edu/BeTAIL/home.