ReMP: Reusable Motion Prior for Multi-domain 3D Human Pose Estimation and Motion Inbetweening
This addresses the challenge of robust 3D motion tracking for applications in computer vision and robotics, though it appears incremental as it builds on foundation models and prior work in motion priors.
The paper tackles the problem of 3D human pose estimation and motion inbetweening across multiple domains by proposing ReMP, a reusable motion prior that improves accuracy and training efficiency, consistently outperforming baselines on diverse data like depth point clouds and LiDAR scans.
We present Reusable Motion prior (ReMP), an effective motion prior that can accurately track the temporal evolution of motion in various downstream tasks. Inspired by the success of foundation models, we argue that a robust spatio-temporal motion prior can encapsulate underlying 3D dynamics applicable to various sensor modalities. We learn the rich motion prior from a sequence of complete parametric models of posed human body shape. Our prior can easily estimate poses in missing frames or noisy measurements despite significant occlusion by employing a temporal attention mechanism. More interestingly, our prior can guide the system with incomplete and challenging input measurements to quickly extract critical information to estimate the sequence of poses, significantly improving the training efficiency for mesh sequence recovery. ReMP consistently outperforms the baseline method on diverse and practical 3D motion data, including depth point clouds, LiDAR scans, and IMU sensor data. Project page is available in https://hojunjang17.github.io/ReMP.