Closing the Reality Gap with Unsupervised Sim-to-Real Image Translation
This addresses the data scarcity problem in robotics and computer vision by reducing reliance on expensive real-world data collection, though it is an incremental advancement in sim-to-real transfer.
The paper tackles the simulation-to-reality gap in computer vision by introducing an unsupervised image-to-image translation method that applies real-world styles to synthetic images, resulting in a significant improvement for autonomous soccer robots compared to purely simulation-trained models.
Deep learning approaches have become the standard solution to many problems in computer vision and robotics, but obtaining sufficient training data in high enough quality is challenging, as human labor is error prone, time consuming, and expensive. Solutions based on simulation have become more popular in recent years, but the gap between simulation and reality is still a major issue. In this paper, we introduce a novel method for augmenting synthetic image data through unsupervised image-to-image translation by applying the style of real world images to simulated images with open source frameworks. The generated dataset is combined with conventional augmentation methods and is then applied to a neural network model running in real-time on autonomous soccer robots. Our evaluation shows a significant improvement compared to models trained on images generated entirely in simulation.