ROAIETSPSYSep 22, 2025

HuMam: Humanoid Motion Control via End-to-End Deep Reinforcement Learning with Mamba

arXiv:2509.18046v11 citationsh-index: 1
Originality Highly original
AI Analysis

This addresses practical challenges in humanoid motion control for robotics applications, representing an incremental improvement with a novel method for a known bottleneck.

The paper tackles the problem of training instability, inefficient feature fusion, and high actuation cost in end-to-end reinforcement learning for humanoid locomotion by presenting HuMam, a framework using a Mamba encoder and PPO optimization, which improves learning efficiency, training stability, and task performance while reducing power consumption and torque peaks on the JVRC-1 humanoid in mc-mujoco.

End-to-end reinforcement learning (RL) for humanoid locomotion is appealing for its compact perception-action mapping, yet practical policies often suffer from training instability, inefficient feature fusion, and high actuation cost. We present HuMam, a state-centric end-to-end RL framework that employs a single-layer Mamba encoder to fuse robot-centric states with oriented footstep targets and a continuous phase clock. The policy outputs joint position targets tracked by a low-level PD loop and is optimized with PPO. A concise six-term reward balances contact quality, swing smoothness, foot placement, posture, and body stability while implicitly promoting energy saving. On the JVRC-1 humanoid in mc-mujoco, HuMam consistently improves learning efficiency, training stability, and overall task performance over a strong feedforward baseline, while reducing power consumption and torque peaks. To our knowledge, this is the first end-to-end humanoid RL controller that adopts Mamba as the fusion backbone, demonstrating tangible gains in efficiency, stability, and control economy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes