CVJun 6, 2024

Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking

arXiv:2406.04316v151 citations
Originality Incremental advance
AI Analysis

This work addresses a critical data scarcity problem in computer vision for researchers and practitioners in robotics and AR/VR, though it is incremental as it builds on existing frameworks.

The paper tackles the lack of large-scale datasets for 6D object pose estimation by introducing Omni6DPose, a diverse dataset with over 800K images and 6.5M annotations, and presents GenPose++, an enhanced method that achieves state-of-the-art results on this benchmark.

6D Object Pose Estimation is a crucial yet challenging task in computer vision, suffering from a significant lack of large-scale datasets. This scarcity impedes comprehensive evaluation of model performance, limiting research advancements. Furthermore, the restricted number of available instances or categories curtails its applications. To address these issues, this paper introduces Omni6DPose, a substantial dataset characterized by its diversity in object categories, large scale, and variety in object materials. Omni6DPose is divided into three main components: ROPE (Real 6D Object Pose Estimation Dataset), which includes 332K images annotated with over 1.5M annotations across 581 instances in 149 categories; SOPE(Simulated 6D Object Pose Estimation Dataset), consisting of 475K images created in a mixed reality setting with depth simulation, annotated with over 5M annotations across 4162 instances in the same 149 categories; and the manually aligned real scanned objects used in both ROPE and SOPE. Omni6DPose is inherently challenging due to the substantial variations and ambiguities. To address this challenge, we introduce GenPose++, an enhanced version of the SOTA category-level pose estimation framework, incorporating two pivotal improvements: Semantic-aware feature extraction and Clustering-based aggregation. Moreover, we provide a comprehensive benchmarking analysis to evaluate the performance of previous methods on this large-scale dataset in the realms of 6D object pose estimation and pose tracking.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes