Style-transfer GANs for bridging the domain gap in synthetic pose estimator training
This addresses the problem of reducing data dependency for pose estimation in computer vision, though it is incremental as it builds on existing GAN methods.
The paper tackles the domain gap between synthetic and real data in training pose estimators by using style-transfer GANs for image translation, resulting in a considerable improvement in model performance compared to domain randomization.
Given the dependency of current CNN architectures on a large training set, the possibility of using synthetic data is alluring as it allows generating a virtually infinite amount of labeled training data. However, producing such data is a non-trivial task as current CNN architectures are sensitive to the domain gap between real and synthetic data. We propose to adopt general-purpose GAN models for pixel-level image translation, allowing to formulate the domain gap itself as a learning problem. The obtained models are then used either during training or inference to bridge the domain gap. Here, we focus on training the single-stage YOLO6D object pose estimator on synthetic CAD geometry only, where not even approximate surface information is available. When employing paired GAN models, we use an edge-based intermediate domain and introduce different mappings to represent the unknown surface properties. Our evaluation shows a considerable improvement in model performance when compared to a model trained with the same degree of domain randomization, while requiring only very little additional effort.