CVApr 21, 2025

Instance-Adaptive Keypoint Learning with Local-to-Global Geometric Aggregation for Category-Level Object Pose Estimation

arXiv:2504.15134v3h-index: 11
AI Analysis

This addresses the problem of handling complex geometries in unseen objects for robotics and AR/VR applications, representing an incremental improvement over existing methods.

The paper tackles category-level object pose estimation for unseen instances by proposing INKL-Pose, which learns instance-adaptive keypoints with local-to-global geometric aggregation, achieving state-of-the-art performance on benchmarks like CAMERA25 and REAL275 with 16.7M parameters and 36 FPS.

Category-level object pose estimation aims to predict the 6D pose and size of previously unseen instances from predefined categories, requiring strong generalization across diverse object instances. Although many previous methods attempt to mitigate intra-class variations, they often struggle with instances exhibiting complex geometries or significant deviations from canonical shapes. To address this issue, we propose INKL-Pose, a novel category-level object pose estimation framework that enables INstance-adaptive Keypoint Learning with local-to-global geometric aggregation. Specifically, our method first predicts semantically consistent and geometrically informative keypoints using an Instance-Adaptive Keypoint Detector, then refines them: (1) a Local Keypoint Feature Aggregator capturing fine-grained geometries, and (2) a Global Keypoint Feature Aggregator using bidirectional Mamba for structural consistency. To enable bidirectional modeling in Mamba, we introduce a simple yet effective Feature Sequence Flipping strategy that preserves spatial coherence while constructing backward feature sequence. Additionally, we design a surface loss and a separation loss to encourage uniform coverage and spatial diversity in keypoint distribution. The resulting keypoints are mapped to a canonical space for 6D pose and size regression. Extensive experiments on CAMERA25, REAL275, and HouseCat6D show that INKL-Pose achieves state-of-the-art performance with 16.7M parameters and runs at 36 FPS on an NVIDIA RTX 4090D GPU.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes