CVNov 24, 2021

UDA-COPE: Unsupervised Domain Adaptation for Category-level Object Pose Estimation

arXiv:2111.12580v243 citations
Originality Incremental advance
AI Analysis

This addresses the cost and labor of obtaining real-world pose labels for robotics and computer vision applications, though it is incremental as it builds on existing multi-modal UDA techniques.

The paper tackles the problem of expensive ground-truth labels for category-level object pose estimation by proposing UDA-COPE, an unsupervised domain adaptation method using a teacher-student self-supervised learning scheme and bidirectional filtering, achieving comparable or superior performance to methods that rely on ground-truth labels.

Learning to estimate object pose often requires ground-truth (GT) labels, such as CAD model and absolute-scale object pose, which is expensive and laborious to obtain in the real world. To tackle this problem, we propose an unsupervised domain adaptation (UDA) for category-level object pose estimation, called UDA-COPE. Inspired by recent multi-modal UDA techniques, the proposed method exploits a teacher-student self-supervised learning scheme to train a pose estimation network without using target domain pose labels. We also introduce a bidirectional filtering method between the predicted normalized object coordinate space (NOCS) map and observed point cloud, to not only make our teacher network more robust to the target domain but also to provide more reliable pseudo labels for the student network training. Extensive experimental results demonstrate the effectiveness of our proposed method both quantitatively and qualitatively. Notably, without leveraging target-domain GT labels, our proposed method achieved comparable or sometimes superior performance to existing methods that depend on the GT labels.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes