CVMay 20

Map-Mono-Ego: Map-Grounded Global Human Pose Estimation from Monocular Egocentric Video

arXiv:2605.2088930.1
AI Analysis

This work addresses the challenge of absolute location tracking in monocular egocentric pose estimation, which is crucial for ubiquitous activity monitoring without specialized hardware.

MapMonoEgo achieves globally consistent human pose estimation from monocular egocentric video by leveraging a pre-scanned 3D point cloud, significantly outperforming the state-of-the-art baseline on the new AIST-Living dataset.

Monocular egocentric human pose estimation is essential for ubiquitous activity monitoring. However, understanding the user's absolute location within the environment remains a challenge. Existing methods primarily focus on relative motion from an initial position, and tend not to account for the wearer's absolute location within an environment. Furthermore, inherent scale ambiguity in monocular vision leads to severe translational drift, limiting long-term tracking without specialized multi-sensor hardware. To address this, we propose MapMonoEgo, a novel framework achieving globally consistent human pose estimation solely from a monocular camera by leveraging a pre-scanned 3D point cloud. We also introduce AIST-Living dataset, a new dataset pairing egocentric video with ground-truth motion in a scanned environment. Experiments demonstrate that our approach significantly outperforms the state-of-the-art baseline, proving its utility for practical monitoring tasks without specialized hardware.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes