CVOct 20, 2022

Transformer-based Global 3D Hand Pose Estimation in Two Hands Manipulating Objects Scenarios

arXiv:2210.11384v11 citationsh-index: 22
Originality Incremental advance
AI Analysis

This addresses the challenge of accurate hand pose estimation in complex two-handed manipulation scenarios for applications like VR/AR and robotics.

The authors tackled the problem of estimating global 3D hand poses from egocentric images where two hands interact with objects, achieving errors of 14.4 mm for the left hand and 15.9 mm for the right hand on a test set.

This report describes our 1st place solution to ECCV 2022 challenge on Human Body, Hands, and Activities (HBHA) from Egocentric and Multi-view Cameras (hand pose estimation). In this challenge, we aim to estimate global 3D hand poses from the input image where two hands and an object are interacting on the egocentric viewpoint. Our proposed method performs end-to-end multi-hand pose estimation via transformer architecture. In particular, our method robustly estimates hand poses in a scenario where two hands interact. Additionally, we propose an algorithm that considers hand scales to robustly estimate the absolute depth. The proposed algorithm works well even when the hand sizes are various for each person. Our method attains 14.4 mm (left) and 15.9 mm (right) errors for each hand in the test set.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes