Camera Motion Agnostic 3D Human Pose Estimation
This addresses the challenge of extracting pure human motion for applications like robotics or sports analysis when using moving cameras, representing an incremental improvement over prior methods that were limited to camera or human-centered coordinates.
The paper tackles the problem of estimating 3D human pose and mesh in world coordinates from videos with moving cameras, which existing methods struggle with due to camera motion coupling, and proposes a camera motion agnostic approach that predicts global motion sequences, achieving effectiveness as proven through experiments on 3DPW and synthetic datasets.
Although the performance of 3D human pose and shape estimation methods has improved significantly in recent years, existing approaches typically generate 3D poses defined in camera or human-centered coordinate system. This makes it difficult to estimate a person's pure pose and motion in world coordinate system for a video captured using a moving camera. To address this issue, this paper presents a camera motion agnostic approach for predicting 3D human pose and mesh defined in the world coordinate system. The core idea of the proposed approach is to estimate the difference between two adjacent global poses (i.e., global motion) that is invariant to selecting the coordinate system, instead of the global pose coupled to the camera motion. To this end, we propose a network based on bidirectional gated recurrent units (GRUs) that predicts the global motion sequence from the local pose sequence consisting of relative rotations of joints called global motion regressor (GMR). We use 3DPW and synthetic datasets, which are constructed in a moving-camera environment, for evaluation. We conduct extensive experiments and prove the effectiveness of the proposed method empirically. Code and datasets are available at https://github.com/seonghyunkim1212/GMR