PoseTrack: Joint Multi-Person Pose Estimation and Tracking
This addresses the challenge of tracking multiple people's poses over time in videos, which is incremental as it builds on existing image-based pose estimation methods.
The authors tackled the problem of joint multi-person pose estimation and tracking in unconstrained videos by proposing a method that models it as a spatio-temporal graph and solves it with integer linear programming, achieving results on a new dataset they introduced.
In this work, we introduce the challenging problem of joint multi-person pose estimation and tracking of an unknown number of persons in unconstrained videos. Existing methods for multi-person pose estimation in images cannot be applied directly to this problem, since it also requires to solve the problem of person association over time in addition to the pose estimation for each person. We therefore propose a novel method that jointly models multi-person pose estimation and tracking in a single formulation. To this end, we represent body joint detections in a video by a spatio-temporal graph and solve an integer linear program to partition the graph into sub-graphs that correspond to plausible body pose trajectories for each person. The proposed approach implicitly handles occlusion and truncation of persons. Since the problem has not been addressed quantitatively in the literature, we introduce a challenging "Multi-Person PoseTrack" dataset, and also propose a completely unconstrained evaluation protocol that does not make any assumptions about the scale, size, location or the number of persons. Finally, we evaluate the proposed approach and several baseline methods on our new dataset.