CVApr 12, 2021

Egocentric Pose Estimation from Human Vision Span

arXiv:2104.05167v138 citations
Originality Incremental advance
AI Analysis

This addresses a realistic problem for wearable device users by enabling pose estimation from peripheral views, though it is incremental as it builds on existing methods for a new setting.

The paper tackles egocentric pose estimation from a human vision span, where the wearer is partially visible, and proposes a deep learning system that integrates SLAM features and body shape imagery to achieve real-time, high-accuracy 3D pose estimation.

Estimating camera wearer's body pose from an egocentric view (egopose) is a vital task in augmented and virtual reality. Existing approaches either use a narrow field of view front facing camera that barely captures the wearer, or an extruded head-mounted top-down camera for maximal wearer visibility. In this paper, we tackle the egopose estimation from a more natural human vision span, where camera wearer can be seen in the peripheral view and depending on the head pose the wearer may become invisible or has a limited partial view. This is a realistic visual field for user-centric wearable devices like glasses which have front facing wide angle cameras. Existing solutions are not appropriate for this setting, and so, we propose a novel deep learning system taking advantage of both the dynamic features from camera SLAM and the body shape imagery. We compute 3D head pose, 3D body pose, the figure/ground separation, all at the same time while explicitly enforcing a certain geometric consistency across pose attributes. We further show that this system can be trained robustly with lots of existing mocap data so we do not have to collect and annotate large new datasets. Lastly, our system estimates egopose in real time and on the fly while maintaining high accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes