RO LGFeb 4

HoRD: Robust Humanoid Control via History-Conditioned Reinforcement Learning and Online Distillation

Puyue Wang, Jiawei Hu, Yan Gao, Junyan Wang, Yu Zhang, Gillian Dobbie, Tao Gu, Wafa Johal, Ting Dang, Hong Jia

arXiv:2602.04412v22.2h-index: 22

Originality Incremental advance

AI Analysis

This work addresses robustness issues in humanoid control for robotics applications, though it appears incremental as it builds on existing methods like reinforcement learning and distillation.

The paper tackles the problem of humanoid robots experiencing performance drops under domain shifts by proposing HoRD, a two-stage learning framework that combines history-conditioned reinforcement learning with online distillation, resulting in a single policy that adapts zero-shot to unseen domains without retraining and outperforms baselines in robustness and transfer.

Humanoid robots can suffer significant performance drops under small changes in dynamics, task specifications, or environment setup. We propose HoRD, a two-stage learning framework for robust humanoid control under domain shift. First, we train a high-performance teacher policy via history-conditioned reinforcement learning, where the policy infers latent dynamics context from recent state--action trajectories to adapt online to diverse randomized dynamics. Second, we perform online distillation to transfer the teacher's robust control capabilities into a transformer-based student policy that operates on sparse root-relative 3D joint keypoint trajectories. By combining history-conditioned adaptation with online distillation, HoRD enables a single policy to adapt zero-shot to unseen domains without per-domain retraining. Extensive experiments show HoRD outperforms strong baselines in robustness and transfer, especially under unseen domains and external perturbations. Code and project page are available at https://tonywang-0517.github.io/hord/.

View on arXiv PDF

Similar