LG AIOct 20, 2025

Closing the Sim2Real Performance Gap in RL

Akhil S Anand, Shambhuraj Sawant, Jasper Hoffmann, Dirk Reinhardt, Sebastien Gros

arXiv:2510.17709v17.11 citationsh-index: 7

Originality Highly original

AI Analysis

This addresses a critical problem for robotics and AI applications by reducing reliance on inaccurate simulation proxies, though it is incremental as it builds on existing Sim2Real methods.

The paper tackles the Sim2Real performance gap in reinforcement learning, where policies trained in simulation degrade in real-world deployment, by proposing a bi-level RL framework that directly adapts simulator parameters based on real-world performance, showing theoretical and empirical validation in simple examples.

Sim2Real aims at training policies in high-fidelity simulation environments and effectively transferring them to the real world. Despite the developments of accurate simulators and Sim2Real RL approaches, the policies trained purely in simulation often suffer significant performance drops when deployed in real environments. This drop is referred to as the Sim2Real performance gap. Current Sim2Real RL methods optimize the simulator accuracy and variability as proxies for real-world performance. However, these metrics do not necessarily correlate with the real-world performance of the policy as established theoretically and empirically in the literature. We propose a novel framework to address this issue by directly adapting the simulator parameters based on real-world performance. We frame this problem as a bi-level RL framework: the inner-level RL trains a policy purely in simulation, and the outer-level RL adapts the simulation model and in-sim reward parameters to maximize real-world performance of the in-sim policy. We derive and validate in simple examples the mathematical tools needed to develop bi-level RL algorithms that close the Sim2Real performance gap.

View on arXiv PDF

Similar