CVMay 21

REACH: Hand Pose Estimation from Room Corners

arXiv:2605.2223114.2
Predicted impact top 77% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the challenging problem of remote 3D hand pose estimation for in-the-wild behavior analysis, offering a practical solution for continuous monitoring without intrusive sensors.

The paper introduces REACH-Net, a Transformer-based model for 3D hand pose estimation from low-resolution, occluded views in room corners, achieving high accuracy by leveraging hand-body coordination, temporal progression, and multiview observations. The method is validated on a new large-scale dataset (REACH) with 50 participants, showing superior performance over existing methods.

We introduce a novel 3D hand pose estimator that can accurately recover the shape and pose of people's hands in a room from afar, typically from fixed cameras at room corners, in extremely low-resolution and frequently occluded views. Our key idea is to fully leverage hand-body coordination, its temporal progression, and multiview observations. We achieve this with a novel Transformer-based model, in which hand and body configurations are modeled through correlations between their visual features expressed as per-view tokens, and their temporal coordination is exploited in an autoregressive manner. We introduce a novel dataset, which we refer to as REACH, Room-Environment dataset Annotated with Chest cameras for Hand pose estimation, to train and test our method. REACH is a first-of-its-kind large-scale hand pose dataset that captures accurate hand movements of 50 participants across a wide variety of daily activities. In order to avoid interfering with natural movements while annotating the hands with accurate shape and pose, we leverage concealed chest cameras. Through extensive experiments, including comparative studies with existing methods, we show that our model, REACH-Net, achieves highly accurate 3D hand pose estimation from afar. These results broaden the horizon of 3D hand pose estimation, especially towards "in-the-wild" continuous human behavior analysis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes