CVROJun 24, 2022

Optimal and Robust Category-level Perception: Object Pose and Shape Estimation from 2D and 3D Semantic Keypoints

MIT
arXiv:2206.12498v316 citationsh-index: 57Has Code
Originality Highly original
AI Analysis

This work addresses category-level perception for robotics and autonomous systems, offering incremental improvements in robustness and optimality over existing methods.

The paper tackles the problem of 3D pose and shape estimation for objects within a category (e.g., cars) from 2D or 3D keypoints, despite shape variations, by developing certifiably optimal and outlier-robust solvers (PACE3D*, PACE2D*, PACE3D#, PACE2D#) that improve state-of-the-art accuracy on datasets like ApolloScape, with runtime suitable for practical use.

We consider a category-level perception problem, where one is given 2D or 3D sensor data picturing an object of a given category (e.g., a car), and has to reconstruct the 3D pose and shape of the object despite intra-class variability (i.e., different car models have different shapes). We consider an active shape model, where -- for an object category -- we are given a library of potential CAD models describing objects in that category, and we adopt a standard formulation where pose and shape are estimated from 2D or 3D keypoints via non-convex optimization. Our first contribution is to develop PACE3D* and PACE2D*, the first certifiably optimal solvers for pose and shape estimation using 3D and 2D keypoints, respectively. Both solvers rely on the design of tight (i.e., exact) semidefinite relaxations. Our second contribution is to develop outlier-robust versions of both solvers, named PACE3D# and PACE2D#. Towards this goal, we propose ROBIN, a general graph-theoretic framework to prune outliers, which uses compatibility hypergraphs to model measurements' compatibility. We show that in category-level perception problems these hypergraphs can be built from the winding orders of the keypoints (in 2D) or their convex hulls (in 3D), and many outliers can be filtered out via maximum hyperclique computation. The last contribution is an extensive experimental evaluation. Besides providing an ablation study on simulated datasets and on the PASCAL3D+ dataset, we combine our solver with a deep keypoint detector, and show that PACE3D# improves over the state of the art in vehicle pose estimation in the ApolloScape datasets, and its runtime is compatible with practical applications. We release our code at https://github.com/MIT-SPARK/PACE.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes