CVDec 20, 2020

Deep Bingham Networks: Dealing with Uncertainty and Ambiguity in Pose Estimation

arXiv:2012.11002v171 citations
AI Analysis

This work addresses the problem of handling pose-related uncertainties and ambiguities for researchers and practitioners working with 3D data in applications like camera relocalization and object pose estimation, offering an incremental improvement over existing single-solution approaches.

This paper introduces Deep Bingham Networks (DBN), a framework for pose estimation that addresses uncertainty and ambiguity by predicting a family of poses rather than a single solution. DBN extends existing direct pose regression networks with a multi-hypotheses prediction head and novel loss functions utilizing Bingham distributions, demonstrating decent advantages over state-of-the-art methods in 6D camera relocalization and achieving top results for symmetric objects in ModelNet.

In this work, we introduce Deep Bingham Networks (DBN), a generic framework that can naturally handle pose-related uncertainties and ambiguities arising in almost all real life applications concerning 3D data. While existing works strive to find a single solution to the pose estimation problem, we make peace with the ambiguities causing high uncertainty around which solutions to identify as the best. Instead, we report a family of poses which capture the nature of the solution space. DBN extends the state of the art direct pose regression networks by (i) a multi-hypotheses prediction head which can yield different distribution modes; and (ii) novel loss functions that benefit from Bingham distributions on rotations. This way, DBN can work both in unambiguous cases providing uncertainty information, and in ambiguous scenes where an uncertainty per mode is desired. On a technical front, our network regresses continuous Bingham mixture models and is applicable to both 2D data such as images and to 3D data such as point clouds. We proposed new training strategies so as to avoid mode or posterior collapse during training and to improve numerical stability. Our methods are thoroughly tested on two different applications exploiting two different modalities: (i) 6D camera relocalization from images; and (ii) object pose estimation from 3D point clouds, demonstrating decent advantages over the state of the art. For the former we contributed our own dataset composed of five indoor scenes where it is unavoidable to capture images corresponding to views that are hard to uniquely identify. For the latter we achieve the top results especially for symmetric objects of ModelNet dataset.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes