ROCVJul 27, 2021

VIPose: Real-time Visual-Inertial 6D Object Pose Tracking

arXiv:2107.12617v210 citations
Originality Incremental advance
AI Analysis

This addresses pose estimation for robotics and AR/VR applications, offering incremental improvements in handling occlusions and real-time efficiency.

The paper tackles 6D object pose tracking by introducing VIPose, a deep neural network that fuses visual and inertial data to predict relative poses between frames, achieving real-time performance with accuracy comparable to state-of-the-art methods on a new dataset.

Estimating the 6D pose of objects is beneficial for robotics tasks such as transportation, autonomous navigation, manipulation as well as in scenarios beyond robotics like virtual and augmented reality. With respect to single image pose estimation, pose tracking takes into account the temporal information across multiple frames to overcome possible detection inconsistencies and to improve the pose estimation efficiency. In this work, we introduce a novel Deep Neural Network (DNN) called VIPose, that combines inertial and camera data to address the object pose tracking problem in real-time. The key contribution is the design of a novel DNN architecture which fuses visual and inertial features to predict the objects' relative 6D pose between consecutive image frames. The overall 6D pose is then estimated by consecutively combining relative poses. Our approach shows remarkable pose estimation results for heavily occluded objects that are well known to be very challenging to handle by existing state-of-the-art solutions. The effectiveness of the proposed approach is validated on a new dataset called VIYCB with RGB image, IMU data, and accurate 6D pose annotations created by employing an automated labeling technique. The approach presents accuracy performances comparable to state-of-the-art techniques, but with the additional benefit of being real-time.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes