CVROOct 13, 2025

DKPMV: Dense Keypoints Fusion from Multi-View RGB Frames for 6D Pose Estimation of Textureless Objects

arXiv:2510.10933v11 citationsh-index: 14
Originality Highly original
AI Analysis

This addresses a challenging problem for industrial robotics by enabling accurate pose estimation without depth data, representing a significant advance over prior multi-view approaches.

The paper tackles 6D pose estimation of textureless objects using only multi-view RGB images, achieving state-of-the-art performance by outperforming existing multi-view RGB methods and even surpassing RGB-D methods in most cases on the ROBI dataset.

6D pose estimation of textureless objects is valuable for industrial robotic applications, yet remains challenging due to the frequent loss of depth information. Current multi-view methods either rely on depth data or insufficiently exploit multi-view geometric cues, limiting their performance. In this paper, we propose DKPMV, a pipeline that achieves dense keypoint-level fusion using only multi-view RGB images as input. We design a three-stage progressive pose optimization strategy that leverages dense multi-view keypoint geometry information. To enable effective dense keypoint fusion, we enhance the keypoint network with attentional aggregation and symmetry-aware training, improving prediction accuracy and resolving ambiguities on symmetric objects. Extensive experiments on the ROBI dataset demonstrate that DKPMV outperforms state-of-the-art multi-view RGB approaches and even surpasses the RGB-D methods in the majority of cases. The code will be available soon.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes