CVOct 31, 2022

UmeTrack: Unified multi-view end-to-end hand tracking for VR

arXiv:2211.00099v190 citationsh-index: 34
Originality Incremental advance
AI Analysis

This addresses the problem of accurate and efficient hand tracking for VR interaction, though it appears incremental by building on existing multi-view and end-to-end approaches.

The paper tackles real-time 3D hand pose tracking in world space for VR by introducing a unified end-to-end differentiable framework that directly predicts poses from multi-view inputs, and it demonstrates effectiveness through a new large-scale dataset and application in real-time VR.

Real-time tracking of 3D hand pose in world space is a challenging problem and plays an important role in VR interaction. Existing work in this space are limited to either producing root-relative (versus world space) 3D pose or rely on multiple stages such as generating heatmaps and kinematic optimization to obtain 3D pose. Moreover, the typical VR scenario, which involves multi-view tracking from wide \ac{fov} cameras is seldom addressed by these methods. In this paper, we present a unified end-to-end differentiable framework for multi-view, multi-frame hand tracking that directly predicts 3D hand pose in world space. We demonstrate the benefits of end-to-end differentiabilty by extending our framework with downstream tasks such as jitter reduction and pinch prediction. To demonstrate the efficacy of our model, we further present a new large-scale egocentric hand pose dataset that consists of both real and synthetic data. Experiments show that our system trained on this dataset handles various challenging interactive motions, and has been successfully applied to real-time VR applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes