TAGA: Terrain-aware Active Gaze Learning for Generalizable Agile Humanoid Locomotion
For humanoid locomotion, this work addresses the bottleneck of balancing perceptual coverage and computational constraints, enabling more generalizable agile locomotion on hardware.
TAGA introduces a terrain-aware active gaze learning framework that enables humanoid robots to selectively attend to relevant terrain regions, achieving robust locomotion across diverse challenging terrains. The system demonstrates a record 1.2m gap traversal distance in real-world tests, outperforming prior perceptive humanoid systems.
Agile humanoid locomotion across diverse challenging terrain demands both wide perceptual coverage and precise local geometry understanding. Motivated by the way humans selectively look at relevant terrain during locomotion, we introduce TAGA, a Terrain-aware Active Gaze learning framework for Attention-based humanoid control. By fusing vision, proprioception, and motion commands, our framework guides the model to learn anticipatory cues and actively attend to specific areas of the height scan, selectively using these informative regions for the downstream network. This adaptively increases the information density of observations under tight onboard computational constraints, thus enabling fine-grained perceptive locomotion over larger-scale terrains. We find that such gaze behaviors can naturally emerge through reinforcement learning alone, without requiring additional supervision or explicit guidance, significantly improve training efficiency. As a result, the trained policy demonstrates robust and generalizable locomotion in simulation and on hardware, including reliable terrain-aware foothold selection, elevated-platform traversal, competitive sparse-foothold traversal, and the largest reported real-world gap traversal distance of 1.2m among perceptive humanoid locomotion systems, while maintaining stability under severe perceptual disturbances and environmental interference.