Transformer-based Global 3D Hand Pose Estimation in Two Hands Manipulating Objects Scenarios
This addresses the challenge of accurate hand pose estimation in complex two-handed manipulation scenarios for applications like VR/AR and robotics.
The authors tackled the problem of estimating global 3D hand poses from egocentric images where two hands interact with objects, achieving errors of 14.4 mm for the left hand and 15.9 mm for the right hand on a test set.
This report describes our 1st place solution to ECCV 2022 challenge on Human Body, Hands, and Activities (HBHA) from Egocentric and Multi-view Cameras (hand pose estimation). In this challenge, we aim to estimate global 3D hand poses from the input image where two hands and an object are interacting on the egocentric viewpoint. Our proposed method performs end-to-end multi-hand pose estimation via transformer architecture. In particular, our method robustly estimates hand poses in a scenario where two hands interact. Additionally, we propose an algorithm that considers hand scales to robustly estimate the absolute depth. The proposed algorithm works well even when the hand sizes are various for each person. Our method attains 14.4 mm (left) and 15.9 mm (right) errors for each hand in the test set.