CVJan 23, 2019

A Top-down Approach to Articulated Human Pose Estimation and Tracking

arXiv:1901.07680v146 citations
Originality Synthesis-oriented
AI Analysis

This work addresses pose estimation and tracking for computer vision applications, but it is incremental as it combines existing methods into a baseline system.

The paper tackles multi-person human pose estimation and tracking in videos by building a top-down baseline system with three modules: a human detector, a pose estimator, and a flow-based tracker, achieving results in ECCV 18 PoseTrack challenges.

Both the tasks of multi-person human pose estimation and pose tracking in videos are quite challenging. Existing methods can be categorized into two groups: top-down and bottom-up approaches. In this paper, following the top-down approach, we aim to build a strong baseline system with three modules: human candidate detector, single-person pose estimator and human pose tracker. Firstly, we choose a generic object detector among state-of-the-art methods to detect human candidates. Then, the cascaded pyramid network is used to estimate the corresponding human pose. Finally, we use a flow-based pose tracker to render keypoint-association across frames, i.e., assigning each human candidate a unique and temporally-consistent id, for the multi-target pose tracking purpose. We conduct extensive ablative experiments to validate various choices of models and configurations. We take part in two ECCV 18 PoseTrack challenges: pose estimation and pose tracking.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes