Zero-shot Human Pose Estimation using Diffusion-based Inverse solvers
This addresses the challenge of poor generalization in pose estimation due to body size variations, offering a solution for practical applications with sparse sensors.
The paper tackles the problem of human pose estimation with limited body sensors by formulating it as an inverse problem, proposing the InPose method that uses a pre-trained diffusion model conditioned on rotational measurements and guided by location likelihood, achieving zero-shot generalization across users.
Pose estimation refers to tracking a human's full body posture, including their head, torso, arms, and legs. The problem is challenging in practical settings where the number of body sensors are limited. Past work has shown promising results using conditional diffusion models, where the pose prediction is conditioned on both <location, rotation> measurements from the sensors. Unfortunately, nearly all these approaches generalize poorly across users, primarly because location measurements are highly influenced by the body size of the user. In this paper, we formulate pose estimation as an inverse problem and design an algorithm capable of zero-shot generalization. Our idea utilizes a pre-trained diffusion model and conditions it on rotational measurements alone; the priors from this model are then guided by a likelihood term, derived from the measured locations. Thus, given any user, our proposed InPose method generatively estimates the highly likely sequence of poses that best explains the sparse on-body measurements.