CVFeb 14, 2023

MSDA: Monocular Self-supervised Domain Adaptation for 6D Object Pose Estimation

arXiv:2302.07300v13 citationsh-index: 45
Originality Incremental advance
AI Analysis

This addresses the domain gap issue in 6D pose estimation for robotics and AR/VR applications, offering a practical solution to reduce labeling costs, though it is incremental as it builds on existing pose estimators.

The paper tackles the problem of expensive real-world labeling for 6D object pose estimation by proposing a self-supervised domain adaptation method that uses synthetic data for pre-training and real RGB(-D) data for fine-tuning without pose labels, achieving comparable performance to fully-supervised methods and outperforming state-of-the-art approaches on the YCB-Video dataset.

Acquiring labeled 6D poses from real images is an expensive and time-consuming task. Though massive amounts of synthetic RGB images are easy to obtain, the models trained on them suffer from noticeable performance degradation due to the synthetic-to-real domain gap. To mitigate this degradation, we propose a practical self-supervised domain adaptation approach that takes advantage of real RGB(-D) data without needing real pose labels. We first pre-train the model with synthetic RGB images and then utilize real RGB(-D) images to fine-tune the pre-trained model. The fine-tuning process is self-supervised by the RGB-based pose-aware consistency and the depth-guided object distance pseudo-label, which does not require the time-consuming online differentiable rendering. We build our domain adaptation method based on the recent pose estimator SC6D and evaluate it on the YCB-Video dataset. We experimentally demonstrate that our method achieves comparable performance against its fully-supervised counterpart while outperforming existing state-of-the-art approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes