LGCVROMar 3, 2025

Multi-Stage Manipulation with Demonstration-Augmented Reward, Policy, and World Model Learning

arXiv:2503.01837v25 citationsh-index: 19ICML
Originality Incremental advance
AI Analysis

This addresses the problem of inefficient exploration in reinforcement learning for robotic manipulation, offering incremental improvements in data-efficiency for researchers and practitioners in robotics and AI.

The paper tackles the challenge of learning long-horizon robotic manipulation tasks with sparse rewards by proposing DEMO3, a framework that leverages multi-stage structure and demonstrations to improve data-efficiency, achieving an average 40% improvement and up to 70% on difficult tasks compared to state-of-the-art methods.

Long-horizon tasks in robotic manipulation present significant challenges in reinforcement learning (RL) due to the difficulty of designing dense reward functions and effectively exploring the expansive state-action space. However, despite a lack of dense rewards, these tasks often have a multi-stage structure, which can be leveraged to decompose the overall objective into manageable subgoals. In this work, we propose DEMO3, a framework that exploits this structure for efficient learning from visual inputs. Specifically, our approach incorporates multi-stage dense reward learning, a bi-phasic training scheme, and world model learning into a carefully designed demonstration-augmented RL framework that strongly mitigates the challenge of exploration in long-horizon tasks. Our evaluations demonstrate that our method improves data-efficiency by an average of 40% and by 70% on particularly difficult tasks compared to state-of-the-art approaches. We validate this across 16 sparse-reward tasks spanning four domains, including challenging humanoid visual control tasks using as few as five demonstrations.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes