CVLGROSep 21, 2021

KDFNet: Learning Keypoint Distance Field for 6D Object Pose Estimation

arXiv:2109.10127v118 citations
Originality Highly original
AI Analysis

This addresses robust pose estimation for robotics and AR/VR applications, offering a specific improvement over existing methods for challenging cases like occlusion.

The paper tackles the problem of 6D object pose estimation from RGB images, particularly for handling occlusion and long/thin objects, by proposing a novel Keypoint Distance Field (KDF) representation and distance-based voting scheme, achieving state-of-the-art performance with 50.3% average ADD(-S) accuracy on Occlusion LINEMOD and 75.72% on TOD mug subset.

We present KDFNet, a novel method for 6D object pose estimation from RGB images. To handle occlusion, many recent works have proposed to localize 2D keypoints through pixel-wise voting and solve a Perspective-n-Point (PnP) problem for pose estimation, which achieves leading performance. However, such voting process is direction-based and cannot handle long and thin objects where the direction intersections cannot be robustly found. To address this problem, we propose a novel continuous representation called Keypoint Distance Field (KDF) for projected 2D keypoint locations. Formulated as a 2D array, each element of the KDF stores the 2D Euclidean distance between the corresponding image pixel and a specified projected 2D keypoint. We use a fully convolutional neural network to regress the KDF for each keypoint. Using this KDF encoding of projected object keypoint locations, we propose to use a distance-based voting scheme to localize the keypoints by calculating circle intersections in a RANSAC fashion. We validate the design choices of our framework by extensive ablation experiments. Our proposed method achieves state-of-the-art performance on Occlusion LINEMOD dataset with an average ADD(-S) accuracy of 50.3% and TOD dataset mug subset with an average ADD accuracy of 75.72%. Extensive experiments and visualizations demonstrate that the proposed method is able to robustly estimate the 6D pose in challenging scenarios including occlusion.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes