ROApr 9

HEX: Humanoid-Aligned Experts for Cross-Embodiment Whole-Body Manipulation

arXiv:2604.0799399.13 citationsh-index: 25
Predicted impact top 1% in RO · last 90 daysOriginality Highly original
AI Analysis

This addresses the problem of unstable and inefficient whole-body control for humanoid robots, enabling better manipulation in fast-reaction and long-horizon scenarios.

The paper tackled the challenge of high-DoF whole-body manipulation for humanoid robots by proposing HEX, a state-centric framework that introduces a humanoid-aligned universal state representation and a Mixture-of-Experts predictor, achieving state-of-the-art performance in task success rate and generalization on real-world tasks.

Humans achieve complex manipulation through coordinated whole-body control, whereas most Vision-Language-Action (VLA) models treat robot body parts largely independently, making high-DoF humanoid control challenging and often unstable. We present HEX, a state-centric framework for coordinated manipulation on full-sized bipedal humanoid robots. HEX introduces a humanoid-aligned universal state representation for scalable learning across heterogeneous embodiments, and incorporates a Mixture-of-Experts Unified Proprioceptive Predictor to model whole-body coordination and temporal motion dynamics from large-scale multi-embodiment trajectory data. To efficiently capture temporal visual context, HEX uses lightweight history tokens to summarize past observations, avoiding repeated encoding of historical images during inference. It further employs a residual-gated fusion mechanism with a flow-matching action head to adaptively integrate visual-language cues with proprioceptive dynamics for action generation. Experiments on real-world humanoid manipulation tasks show that HEX achieves state-of-the-art performance in task success rate and generalization, particularly in fast-reaction and long-horizon scenarios.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes