RO CVJan 26, 2025

Bridging the Sim2Real Gap: Vision Encoder Pre-Training for Visuomotor Policy Transfer

Yash Yardi, Samuel Biruduganti, Lars Ankile

arXiv:2501.16389v27.83 citationsh-index: 7Has Code2025 International Conference on Electrical and Computer Engineering Researches (ICECER)

Originality Incremental advance

AI Analysis

This work addresses the Sim2Real distribution shift problem for roboticists, offering an incremental evaluation framework to improve policy transferability.

The paper tackled the simulation-to-reality gap in robotic policy transfer by evaluating large-scale pre-trained vision encoders, finding that manipulation-pretrained encoders achieve higher action scores and CNNs show stronger domain invariance than ViTs, with the best models combining both properties.

Simulation offers a scalable and efficient alternative to real-world data collection for learning visuomotor robotic policies. However, the simulation-to-reality, or Sim2Real distribution shift -- introduced by employing simulation-trained policies in real-world environments -- frequently prevents successful policy transfer. We present an offline framework to evaluate the performance of using large-scale pre-trained vision encoders to address the Sim2Real gap. We examine a diverse collection of encoders, assessing their ability to extract features necessary for robot control (Action Score) while remaining invariant to task-irrelevant environmental variations (Domain Invariance Score). Evaluating 23 encoders, we reveal patterns across architectures, pre-training datasets, and parameter scales. Our findings show that manipulation-pretrained encoders consistently achieve higher Action Scores, CNN-based encoders demonstrate stronger domain invariance than ViTs, and the best-performing models combine both properties, underscoring DIS and AS as complementary predictors of Sim2Real transferability.

View on arXiv PDF Code

Similar