ROAISep 8, 2025

Learning to Walk with Less: a Dyna-Style Approach to Quadrupedal Locomotion

arXiv:2509.06296v12 citationsh-index: 8
Originality Incremental advance
AI Analysis

This work addresses data inefficiency in robotics locomotion, offering an incremental improvement for researchers and practitioners in reinforcement learning and robotics.

The paper tackles low data efficiency in RL-based locomotion controllers by introducing a model-based reinforcement learning framework that uses synthetic data to improve sample efficiency for quadrupedal locomotion, showing improved policy return and reduced variance with fewer simulated steps.

Traditional RL-based locomotion controllers often suffer from low data efficiency, requiring extensive interaction to achieve robust performance. We present a model-based reinforcement learning (MBRL) framework that improves sample efficiency for quadrupedal locomotion by appending synthetic data to the end of standard rollouts in PPO-based controllers, following the Dyna-Style paradigm. A predictive model, trained alongside the policy, generates short-horizon synthetic transitions that are gradually integrated using a scheduling strategy based on the policy update iterations. Through an ablation study, we identified a strong correlation between sample efficiency and rollout length, which guided the design of our experiments. We validated our approach in simulation on the Unitree Go1 robot and showed that replacing part of the simulated steps with synthetic ones not only mimics extended rollouts but also improves policy return and reduces variance. Finally, we demonstrate that this improvement transfers to the ability to track a wide range of locomotion commands using fewer simulated steps.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes