RMBench: Memory-Dependent Robotic Manipulation Benchmark with Insights into Policy Design
This work addresses the problem of evaluating memory capabilities in robotic manipulation for researchers, though it is incremental as it builds on existing memory-aware policies.
The authors tackled the lack of systematic evaluation for memory-dependent robotic manipulation by introducing RMBench, a benchmark with 9 tasks, and Mem-0, a modular policy for ablation studies, identifying limitations in existing policies and providing empirical insights into design choices.
Robotic manipulation policies have made rapid progress in recent years, yet most existing approaches give limited consideration to memory capabilities. Consequently, they struggle to solve tasks that require reasoning over historical observations and maintaining task-relevant information over time, which are common requirements in real-world manipulation scenarios. Although several memory-aware policies have been proposed, systematic evaluation of memory-dependent manipulation remains underexplored, and the relationship between architectural design choices and memory performance is still not well understood. To address this gap, we introduce RMBench, a simulation benchmark comprising 9 manipulation tasks that span multiple levels of memory complexity, enabling systematic evaluation of policy memory capabilities. We further propose Mem-0, a modular manipulation policy with explicit memory components designed to support controlled ablation studies. Through extensive simulation and real-world experiments, we identify memory-related limitations in existing policies and provide empirical insights into how architectural design choices influence memory performance. The website is available at https://rmbench.github.io/.