ROCVNov 30, 2025

Opening the Sim-to-Real Door for Humanoid Pixel-to-Action Policy Transfer

arXiv:2512.01061v119 citationsh-index: 17
Originality Highly original
AI Analysis

This work addresses the sim-to-real transfer challenge for humanoid robots using pure RGB perception, representing a significant step in articulated loco-manipulation.

The paper tackles the problem of transferring vision-based humanoid loco-manipulation policies from simulation to reality, achieving robust zero-shot performance across diverse door types and outperforming human teleoperators by up to 31.7% in task completion time.

Recent progress in GPU-accelerated, photorealistic simulation has opened a scalable data-generation path for robot learning, where massive physics and visual randomization allow policies to generalize beyond curated environments. Building on these advances, we develop a teacher-student-bootstrap learning framework for vision-based humanoid loco-manipulation, using articulated-object interaction as a representative high-difficulty benchmark. Our approach introduces a staged-reset exploration strategy that stabilizes long-horizon privileged-policy training, and a GRPO-based fine-tuning procedure that mitigates partial observability and improves closed-loop consistency in sim-to-real RL. Trained entirely on simulation data, the resulting policy achieves robust zero-shot performance across diverse door types and outperforms human teleoperators by up to 31.7% in task completion time under the same whole-body control stack. This represents the first humanoid sim-to-real policy capable of diverse articulated loco-manipulation using pure RGB perception.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes