PACE: Human and Camera Motion Estimation from in-the-wild Videos
This work addresses a critical problem in computer vision for applications like augmented reality and robotics, though it is incremental as it builds on existing SLAM and motion prior techniques.
The paper tackles the challenging problem of estimating human and camera motion from videos with moving cameras by proposing a joint optimization framework that disentangles these motions using human motion priors and scene features. The method significantly outperforms prior art on synthetic and real-world datasets, achieving substantial improvements in motion recovery.
We present a method to estimate human motion in a global scene from moving cameras. This is a highly challenging task due to the coupling of human and camera motions in the video. To address this problem, we propose a joint optimization framework that disentangles human and camera motions using both foreground human motion priors and background scene features. Unlike existing methods that use SLAM as initialization, we propose to tightly integrate SLAM and human motion priors in an optimization that is inspired by bundle adjustment. Specifically, we optimize human and camera motions to match both the observed human pose and scene features. This design combines the strengths of SLAM and motion priors, which leads to significant improvements in human and camera motion estimation. We additionally introduce a motion prior that is suitable for batch optimization, making our approach significantly more efficient than existing approaches. Finally, we propose a novel synthetic dataset that enables evaluating camera motion in addition to human motion from dynamic videos. Experiments on the synthetic and real-world RICH datasets demonstrate that our approach substantially outperforms prior art in recovering both human and camera motions.