CVOct 30, 2021

Leveraging SE(3) Equivariance for Self-Supervised Category-Level Object Pose Estimation

arXiv:2111.00190v161 citations
Originality Incremental advance
AI Analysis

This work addresses the need for reducing annotation costs in pose estimation for robotics and computer vision, though it is incremental as it builds on existing equivariant networks.

The paper tackles the problem of category-level 6D object pose estimation from single 3D point clouds without ground-truth annotations, CAD models, or multi-view supervision, achieving accuracy comparable to some fully supervised methods on datasets like ModelNet40 and NOCS-REAL 275.

Category-level object pose estimation aims to find 6D object poses of previously unseen object instances from known categories without access to object CAD models. To reduce the huge amount of pose annotations needed for category-level learning, we propose for the first time a self-supervised learning framework to estimate category-level 6D object pose from single 3D point clouds.During training, our method assumes no ground-truth pose annotations, no CAD models, and no multi-view supervision. The key to our method is to disentangle shape and pose through an invariant shape reconstruction module and an equivariant pose estimation module, empowered by SE(3) equivariant point cloud networks.The invariant shape reconstruction module learns to perform aligned reconstructions, yielding a category-level reference frame without using any annotations. In addition, the equivariant pose estimation module achieves category-level pose estimation accuracy that is comparable to some fully supervised methods. Extensive experiments demonstrate the effectiveness of our approach on both complete and partial depth point clouds from the ModelNet40 benchmark, and on real depth point clouds from the NOCS-REAL 275 dataset. The project page with code and visualizations can be found at: https://dragonlong.github.io/equi-pose.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes