CVMay 18, 2022

PhoCaL: A Multi-Modal Dataset for Category-Level Object Pose Estimation with Photometrically Challenging Objects

arXiv:2205.08811v161 citationsh-index: 58
Originality Synthesis-oriented
AI Analysis

This provides a benchmark for robotic and augmented reality applications, but it is incremental as it focuses on dataset creation rather than new methods.

The authors tackled the need for high-quality datasets in category-level object pose estimation by introducing PhoCaL, a multimodal dataset with 60 household objects across 8 categories, including reflective and transparent objects, and achieved sub-millimeter pose accuracy through a novel robot-supported acquisition process.

Object pose estimation is crucial for robotic applications and augmented reality. Beyond instance level 6D object pose estimation methods, estimating category-level pose and shape has become a promising trend. As such, a new research field needs to be supported by well-designed datasets. To provide a benchmark with high-quality ground truth annotations to the community, we introduce a multimodal dataset for category-level object pose estimation with photometrically challenging objects termed PhoCaL. PhoCaL comprises 60 high quality 3D models of household objects over 8 categories including highly reflective, transparent and symmetric objects. We developed a novel robot-supported multi-modal (RGB, depth, polarisation) data acquisition and annotation process. It ensures sub-millimeter accuracy of the pose for opaque textured, shiny and transparent objects, no motion blur and perfect camera synchronisation. To set a benchmark for our dataset, state-of-the-art RGB-D and monocular RGB methods are evaluated on the challenging scenes of PhoCaL.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes