λ: A Benchmark for Data-Efficiency in Long-Horizon Indoor Mobile Manipulation Robotics
This addresses the need for realistic benchmarks to improve data efficiency in robotics for household and workplace applications, representing an incremental advance in benchmarking methodology.
The paper tackles the problem of data inefficiency in learning long-horizon mobile manipulation tasks by introducing the LAMBDA benchmark, which evaluates models on language-conditioned, multi-room pick-and-place tasks using 571 human-collected demonstrations, finding that a neuro-symbolic method outperforms end-to-end learning with higher success rates and lower data requirements.
Learning to execute long-horizon mobile manipulation tasks is crucial for advancing robotics in household and workplace settings. However, current approaches are typically data-inefficient, underscoring the need for improved models that require realistically sized benchmarks to evaluate their efficiency. To address this, we introduce the LAMBDA (λ) benchmark-Long-horizon Actions for Mobile-manipulation Benchmarking of Directed Activities-which evaluates the data efficiency of models on language-conditioned, long-horizon, multi-room, multi-floor, pick-and-place tasks using a dataset of manageable size, more feasible for collection. Our benchmark includes 571 human-collected demonstrations that provide realism and diversity in simulated and real-world settings. Unlike planner-generated data, these trajectories offer natural variability and replay-verifiability, ensuring robust learning and evaluation. We leverage λ to benchmark current end-to-end learning methods and a modular neuro-symbolic approach that combines foundation models with task and motion planning. We find that learning methods, even when pretrained, yield lower success rates, while a neuro-symbolic method performs significantly better and requires less data.