REALM: A Real-to-Sim Validated Benchmark for Generalization in Robotic Manipulation
This provides a validated simulation tool for researchers to systematically test robotic manipulation models, though it is incremental as it builds on existing VLA frameworks.
The authors tackled the challenge of evaluating generalization in Vision-Language-Action models for robotics by introducing REALM, a simulation benchmark with high-fidelity visuals and aligned control, showing that current models like π₀ and GR00T N1.5 still struggle with generalization and robustness.
Vision-Language-Action (VLA) models empower robots to understand and execute tasks described by natural language instructions. However, a key challenge lies in their ability to generalize beyond the specific environments and conditions they were trained on, which is presently difficult and expensive to evaluate in the real-world. To address this gap, we present REALM, a new simulation environment and benchmark designed to evaluate the generalization capabilities of VLA models, with a specific emphasis on establishing a strong correlation between simulated and real-world performance through high-fidelity visuals and aligned robot control. Our environment offers a suite of 15 perturbation factors, 7 manipulation skills, and more than 3,500 objects. Finally, we establish two task sets that form our benchmark and evaluate the π_{0}, π_{0}-FAST, and GR00T N1.5 VLA models, showing that generalization and robustness remain an open challenge. More broadly, we also show that simulation gives us a valuable proxy for the real-world and allows us to systematically probe for and quantify the weaknesses and failure modes of VLAs. Project page: https://martin-sedlacek.com/realm