CVJul 17, 2023

Uncertainty-aware State Space Transformer for Egocentric 3D Hand Trajectory Forecasting

arXiv:2307.08243v238 citationsh-index: 41
AI Analysis

This work addresses a domain-specific problem for AR/VR applications by moving from 2D to 3D hand trajectory forecasting, representing an incremental advance with a novel method for a known bottleneck.

The paper tackles the problem of predicting 3D hand trajectories from egocentric RGB videos, which is crucial for AR/VR systems, and proposes an uncertainty-aware state space Transformer (USST) that achieves superior performance on H2O and EgoPAT3D datasets.

Hand trajectory forecasting from egocentric views is crucial for enabling a prompt understanding of human intentions when interacting with AR/VR systems. However, existing methods handle this problem in a 2D image space which is inadequate for 3D real-world applications. In this paper, we set up an egocentric 3D hand trajectory forecasting task that aims to predict hand trajectories in a 3D space from early observed RGB videos in a first-person view. To fulfill this goal, we propose an uncertainty-aware state space Transformer (USST) that takes the merits of the attention mechanism and aleatoric uncertainty within the framework of the classical state-space model. The model can be further enhanced by the velocity constraint and visual prompt tuning (VPT) on large vision transformers. Moreover, we develop an annotation workflow to collect 3D hand trajectories with high quality. Experimental results on H2O and EgoPAT3D datasets demonstrate the superiority of USST for both 2D and 3D trajectory forecasting. The code and datasets are publicly released: https://actionlab-cv.github.io/EgoHandTrajPred.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes