CV ROJul 21, 2023

KVN: Keypoints Voting Network with Differentiable RANSAC for Stereo Pose Estimation

arXiv:2307.11543v31.56 citationsh-index: 21Has Code

Originality Incremental advance

AI Analysis

This addresses the problem of non-differentiability in RANSAC for robotics and augmented reality applications, offering an incremental improvement over existing methods.

The paper tackles stereo image-based object pose estimation by introducing a differentiable RANSAC layer into a monocular network and using an uncertainty-driven multi-view PnP solver, achieving state-of-the-art results on public and custom datasets.

Object pose estimation is a fundamental computer vision task exploited in several robotics and augmented reality applications. Many established approaches rely on predicting 2D-3D keypoint correspondences using RANSAC (Random sample consensus) and estimating the object pose using the PnP (Perspective-n-Point) algorithm. Being RANSAC non-differentiable, correspondences cannot be directly learned in an end-to-end fashion. In this paper, we address the stereo image-based object pose estimation problem by i) introducing a differentiable RANSAC layer into a well-known monocular pose estimation network; ii) exploiting an uncertainty-driven multi-view PnP solver which can fuse information from multiple views. We evaluate our approach on a challenging public stereo object pose estimation dataset and a custom-built dataset we call Transparent Tableware Dataset (TTD), yielding state-of-the-art results against other recent approaches. Furthermore, in our ablation study, we show that the differentiable RANSAC layer plays a significant role in the accuracy of the proposed method. We release with this paper the code of our method and the TTD dataset.

View on arXiv PDF Code

Similar