Social EgoMesh Estimation
This work addresses a crucial problem for virtual and augmented reality applications by improving ego-mesh estimation through social interactions, representing a novel method for a known bottleneck.
The paper tackles the problem of estimating the 3D pose of a camera wearer in egocentric video sequences, which is challenging due to limited body visibility, and proposes SEE-ME, a framework that reduces pose estimation error (MPJPE) by 53% compared to the current best technique.
Accurately estimating the 3D pose of the camera wearer in egocentric video sequences is crucial to modeling human behavior in virtual and augmented reality applications. The task presents unique challenges due to the limited visibility of the user's body caused by the front-facing camera mounted on their head. Recent research has explored the utilization of the scene and ego-motion, but it has overlooked humans' interactive nature. We propose a novel framework for Social Egocentric Estimation of body MEshes (SEE-ME). Our approach is the first to estimate the wearer's mesh using only a latent probabilistic diffusion model, which we condition on the scene and, for the first time, on the social wearer-interactee interactions. Our in-depth study sheds light on when social interaction matters most for ego-mesh estimation; it quantifies the impact of interpersonal distance and gaze direction. Overall, SEE-ME surpasses the current best technique, reducing the pose estimation error (MPJPE) by 53%. The code is available at https://github.com/L-Scofano/SEEME.