CVOct 2, 2025

Ego-Exo 3D Hand Tracking in the Wild with a Mobile Multi-Camera Rig

arXiv:2510.02601v11 citationsh-index: 20
Originality Incremental advance
AI Analysis

This addresses the problem of limited environmental diversity and model generalization in egocentric computer vision for researchers and practitioners, though it is incremental as it builds on existing tracking methods with a new capture system.

The paper tackles the challenge of accurate 3D hand tracking in unconstrained settings by introducing a marker-less multi-camera system that captures precise 3D hands and objects in the wild, demonstrating a significant reduction in the trade-off between environmental realism and 3D annotation accuracy.

Accurate 3D tracking of hands and their interactions with the world in unconstrained settings remains a significant challenge for egocentric computer vision. With few exceptions, existing datasets are predominantly captured in controlled lab setups, limiting environmental diversity and model generalization. To address this, we introduce a novel marker-less multi-camera system designed to capture precise 3D hands and objects, which allows for nearly unconstrained mobility in genuinely in-the-wild conditions. We combine a lightweight, back-mounted capture rig with eight exocentric cameras, and a user-worn Meta Quest 3 headset, which contributes two egocentric views. We design an ego-exo tracking pipeline to generate accurate 3D hand pose ground truth from this system, and rigorously evaluate its quality. By collecting an annotated dataset featuring synchronized multi-view images and precise 3D hand poses, we demonstrate the capability of our approach to significantly reduce the trade-off between environmental realism and 3D annotation accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes