ROMay 20

VLA-REPLICA: A Low-Cost, Reproducible Benchmark for Real-World Evaluation of Vision-Language-Action Models

Alex S. Huang, Jiahui Zhang, Shiqing Tang, Yu Xiang

arXiv:2605.2077467.5

AI Analysis

This benchmark addresses the lack of accessible and reproducible real-world evaluation for robotic manipulation researchers, though it is incremental as it builds on existing benchmark concepts.

VLA-REPLICA provides a low-cost, reproducible real-world benchmark for evaluating Vision-Language-Action models, enabling consistent policy evaluation across labs with diverse manipulation tasks and demonstrating reproducibility through consistent results across independently constructed setups.

Vision-Language-Action (VLA) models have shown strong promise for general-purpose robotic manipulation, but their real-world evaluation remains limited by a lack of accessible, reproducible, and consistent benchmarks. Simulation benchmarks fail to capture real-world complexity, while existing real-world benchmarks often require expensive hardware, centralized evaluation, or are limited in task diversity. We introduce VLA-REPLICA, a low-cost, easily reproducible real-world benchmark for evaluating VLA models. Built from off-the-shelf components, our system can be quickly assembled and replicated across laboratories, providing a consistent environment for policy evaluation anywhere in the world. VLA-REPLICA includes a diverse suite of manipulation tasks and a small-scale demonstration dataset for target-domain adaptation, with real-world evaluation protocols for both in-distribution and out-of-distribution settings. Experiments with imitation learning and state-of-the-art VLA models reveal model strengths and limitations, while consistent results across independently constructed setups demonstrate the reproducibility of our benchmark.

View on arXiv PDF

Similar