CVROMay 8

6D Pose Estimation via Keypoint Heatmap Regression with RGB-D Residual Neural Networks

arXiv:2605.080590.19Has Code
AI Analysis25

For robotic and augmented reality applications requiring accurate object pose estimation, this work offers an incremental improvement by combining existing detection and regression techniques with a cross-fusion architecture for depth data.

This paper presents a modular 6D pose estimation framework using keypoint heatmap regression from RGB images, achieving 84.50% ADD accuracy with RGB-only and 92.41% with RGB-D fusion on the LINEMOD dataset.

In this paper, we propose a modular framework for 6D pose estimation based on keypoint heatmap regression. Our approach combines YOLOv10m for object detection with a ResNet18-based network that predicts 2D heatmaps from RGB images. Keypoints extracted from these heatmaps are used to estimate the 6D object pose via the PnP RANSAC algorithm. We compare different keypoint selection strategies to assess their impact on pose accuracy. Additionally, we extend the baseline by incorporating depth data using a cross-fusion architecture, which enables interaction between RGB and depth features at multiple stages. We further explore general training improvements, such as experimenting with activation functions and learning rate scheduling strategies to improve model performance. Our best RGB-only model achieved a mean ADD-based accuracy of 84.50%, while the RGB-D fusion model reached 92.41% on the LINEMOD dataset. The code is available at https://github.com/ameermasood/HeatNet.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes