Robust Category-Level 3D Pose Estimation from Synthetic Data
This addresses the challenge of expensive real-world annotation for computer vision applications like 3D reconstruction, offering a more data-efficient solution.
The paper tackles the problem of domain shift between synthetic and real data for 3D object pose estimation by introducing a new synthetic dataset (SyntheticP3D) and a novel training approach (CC3D), achieving competitive performance with state-of-the-art models using only 10% of real training images and outperforming SOTA by 10.4% with 50% of real data.
Obtaining accurate 3D object poses is vital for numerous computer vision applications, such as 3D reconstruction and scene understanding. However, annotating real-world objects is time-consuming and challenging. While synthetically generated training data is a viable alternative, the domain shift between real and synthetic data is a significant challenge. In this work, we aim to narrow the performance gap between models trained on synthetic data and few real images and fully supervised models trained on large-scale data. We achieve this by approaching the problem from two perspectives: 1) We introduce SyntheticP3D, a new synthetic dataset for object pose estimation generated from CAD models and enhanced with a novel algorithm. 2) We propose a novel approach (CC3D) for training neural mesh models that perform pose estimation via inverse rendering. In particular, we exploit the spatial relationships between features on the mesh surface and a contrastive learning scheme to guide the domain adaptation process. Combined, these two approaches enable our models to perform competitively with state-of-the-art models using only 10% of the respective real training images, while outperforming the SOTA model by 10.4% with a threshold of pi/18 using only 50% of the real training data. Our trained model further demonstrates robust generalization to out-of-distribution scenarios despite being trained with minimal real data.