CVMar 29, 2022

OSOP: A Multi-Stage One Shot Object Pose Estimation Framework

arXiv:2203.15533v2119 citationsh-index: 46
AI Analysis

This addresses the need for efficient object pose estimation in robotics or AR/VR, though it is incremental as it builds on existing template-based and CNN methods.

The authors tackled the problem of object detection and 6 DoF pose estimation without training on target objects, achieving competitive performance on multiple datasets compared to state-of-the-art methods trained on synthetic data.

We present a novel one-shot method for object detection and 6 DoF pose estimation, that does not require training on target objects. At test time, it takes as input a target image and a textured 3D query model. The core idea is to represent a 3D model with a number of 2D templates rendered from different viewpoints. This enables CNN-based direct dense feature extraction and matching. The object is first localized in 2D, then its approximate viewpoint is estimated, followed by dense 2D-3D correspondence prediction. The final pose is computed with PnP. We evaluate the method on LineMOD, Occlusion, Homebrewed, YCB-V and TLESS datasets and report very competitive performance in comparison to the state-of-the-art methods trained on synthetic data, even though our method is not trained on the object models used for testing.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes