RobotDancing: Residual-Action Reinforcement Learning Enables Robust Long-Horizon Humanoid Motion Tracking
This addresses the brittleness in humanoid robot motion control for dynamic tasks, enabling robust performance in real-world applications.
The paper tackles the problem of long-horizon, high-dynamic motion tracking on humanoids, which suffers from error accumulation due to model-plant mismatch, by proposing RobotDancing, a framework that predicts residual joint targets to correct dynamics discrepancies, achieving robust tracking of multi-minute, high-energy behaviors like jumps and cartwheels with high motion quality in zero-shot sim-to-real deployment.
Long-horizon, high-dynamic motion tracking on humanoids remains brittle because absolute joint commands cannot compensate model-plant mismatch, leading to error accumulation. We propose RobotDancing, a simple, scalable framework that predicts residual joint targets to explicitly correct dynamics discrepancies. The pipeline is end-to-end--training, sim-to-sim validation, and zero-shot sim-to-real--and uses a single-stage reinforcement learning (RL) setup with a unified observation, reward, and hyperparameter configuration. We evaluate primarily on Unitree G1 with retargeted LAFAN1 dance sequences and validate transfer on H1/H1-2. RobotDancing can track multi-minute, high-energy behaviors (jumps, spins, cartwheels) and deploys zero-shot to hardware with high motion tracking quality.