CVIVNov 20, 2021

VideoPose: Estimating 6D object pose from videos

arXiv:2111.10677v11 citations
Originality Incremental advance
AI Analysis

This enables real-time object pose estimation for robotics and AR applications, though it is incremental as it builds on existing 2D detection and temporal methods.

The paper tackles 6D object pose estimation from videos by leveraging temporal information with a CNN-RNN architecture, achieving state-of-the-art accuracy on the YCB-Video dataset and real-time performance at 30 fps.

We introduce a simple yet effective algorithm that uses convolutional neural networks to directly estimate object poses from videos. Our approach leverages the temporal information from a video sequence, and is computationally efficient and robust to support robotic and AR domains. Our proposed network takes a pre-trained 2D object detector as input, and aggregates visual features through a recurrent neural network to make predictions at each frame. Experimental evaluation on the YCB-Video dataset show that our approach is on par with the state-of-the-art algorithms. Further, with a speed of 30 fps, it is also more efficient than the state-of-the-art, and therefore applicable to a variety of applications that require real-time object pose estimation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes