ROAICVIVSYAug 16, 2025

LocoMamba: Vision-Driven Locomotion via End-to-End Deep Reinforcement Learning with Mamba

arXiv:2508.11849v25 citationsh-index: 1Adv Eng Informatics
Originality Incremental advance
AI Analysis

This work addresses robot navigation in complex terrains with obstacles, offering incremental improvements in efficiency and generalization for robotics applications.

The paper tackled vision-driven locomotion for robots by introducing LocoMamba, an end-to-end deep reinforcement learning framework using Mamba for efficient sequence modeling, achieving higher returns, success rates, and faster convergence compared to state-of-the-art baselines in simulated environments.

We introduce LocoMamba, a vision-driven cross-modal DRL framework built on selective state-space models, specifically leveraging Mamba, that achieves near-linear-time sequence modeling, effectively captures long-range dependencies, and enables efficient training with longer sequences. First, we embed proprioceptive states with a multilayer perceptron and patchify depth images with a lightweight convolutional neural network, producing compact tokens that improve state representation. Second, stacked Mamba layers fuse these tokens via near-linear-time selective scanning, reducing latency and memory footprint, remaining robust to token length and image resolution, and providing an inductive bias that mitigates overfitting. Third, we train the policy end-to-end with Proximal Policy Optimization under terrain and appearance randomization and an obstacle-density curriculum, using a compact state-centric reward that balances progress, smoothness, and safety. We evaluate our method in challenging simulated environments with static and moving obstacles as well as uneven terrain. Compared with state-of-the-art baselines, our method achieves higher returns and success rates with fewer collisions, exhibits stronger generalization to unseen terrains and obstacle densities, and improves training efficiency by converging in fewer updates under the same compute budget.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes