CV ROMar 20, 2024

DVMNet++: Rethinking Relative Pose Estimation for Unseen Objects

Chen Zhao, Tong Zhang, Zheng Dang, Mathieu Salzmann

arXiv:2403.13683v23.71 citationsh-index: 10Has Code

Originality Highly original

AI Analysis

This addresses the challenge of generalizable object pose estimation for robotics and AR/VR applications, representing a novel method rather than an incremental improvement.

The paper tackles the problem of estimating the relative pose of unseen objects between two images without relying on ground-truth bounding boxes or discrete rotation hypotheses, achieving more accurate pose estimates at lower computational cost compared to state-of-the-art methods on datasets like CO3D and LINEMOD.

Determining the relative pose of a previously unseen object between two images is pivotal to the success of generalizable object pose estimation. Existing approaches typically predict 3D translation utilizing the ground-truth object bounding box and approximate 3D rotation with a large number of discrete hypotheses. This strategy makes unrealistic assumptions about the availability of ground truth and incurs a computationally expensive process of scoring each hypothesis at test time. By contrast, we rethink the problem of relative pose estimation for unseen objects by presenting a Deep Voxel Matching Network (DVMNet++). Our method computes the relative object pose in a single pass, eliminating the need for ground-truth object bounding boxes and rotation hypotheses. We achieve open-set object detection by leveraging image feature embedding and natural language understanding as reference. The detection result is then employed to approximate the translation parameters and crop the object from the query image. For rotation estimation, we map the two RGB images, i.e., reference and cropped query, to their respective voxelized 3D representations. The resulting voxels are passed through a rotation estimation module, which aligns the voxels and computes the rotation in an end-to-end fashion by solving a least-squares problem. To enhance robustness, we introduce a weighted closest voxel algorithm capable of mitigating the impact of noisy voxels. We conduct extensive experiments on the CO3D, Objaverse, LINEMOD, and LINEMOD-O datasets, demonstrating that our approach delivers more accurate relative pose estimates for novel objects at a lower computational cost compared to state-of-the-art methods. Our code is released at https://github.com/sailor-z/DVMNet/.

View on arXiv PDF Code

Similar