CVJun 18, 2024

PCIE_EgoHandPose Solution for EgoExo4D Hand Pose Challenge

arXiv:2406.12219v13 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses hand pose estimation for egocentric vision applications, representing an incremental improvement in a specific domain.

The authors tackled the problem of accurately estimating 3D hand poses from RGB egocentric video, a challenge due to subtle movements and occlusions, and achieved first place in the EgoExo4D Hand Pose Challenge with scores of 25.51 MPJPE and 8.49 PA-MPJPE.

This report presents our team's 'PCIE_EgoHandPose' solution for the EgoExo4D Hand Pose Challenge at CVPR2024. The main goal of the challenge is to accurately estimate hand poses, which involve 21 3D joints, using an RGB egocentric video image provided for the task. This task is particularly challenging due to the subtle movements and occlusions. To handle the complexity of the task, we propose the Hand Pose Vision Transformer (HP-ViT). The HP-ViT comprises a ViT backbone and transformer head to estimate joint positions in 3D, utilizing MPJPE and RLE loss function. Our approach achieved the 1st position in the Hand Pose challenge with 25.51 MPJPE and 8.49 PA-MPJPE. Code is available at https://github.com/KanokphanL/PCIE_EgoHandPose

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes