Multi-person Pose Tracking using Sequential Monte Carlo with Probabilistic Neural Pose Predictor
This work addresses tracking errors in multi-person pose estimation for video analysis, offering a significant but incremental improvement over existing methods.
The paper tackled multi-person pose tracking in videos by extending frame-by-frame prediction and matching with Sequential Monte Carlo (SMC) to handle uncertainty, achieving a state-of-the-art MOTA score on PoseTrack2018 by reducing tracking errors by about 50% compared to a baseline.
It is an effective strategy for the multi-person pose tracking task in videos to employ prediction and pose matching in a frame-by-frame manner. For this type of approach, uncertainty-aware modeling is essential because precise prediction is impossible. However, previous studies have relied on only a single prediction without incorporating uncertainty, which can cause critical tracking errors if the prediction is unreliable. This paper proposes an extension to this approach with Sequential Monte Carlo (SMC). This naturally reformulates the tracking scheme to handle multiple predictions (or hypotheses) of poses, thereby mitigating the negative effect of prediction errors. An important component of SMC, i.e., a proposal distribution, is designed as a probabilistic neural pose predictor, which can propose diverse and plausible hypotheses by incorporating epistemic uncertainty and heteroscedastic aleatoric uncertainty. In addition, a recurrent architecture is introduced to our neural modeling to utilize time-sequence information of poses to manage difficult situations, such as the frequent disappearance and reappearances of poses. Compared to existing baselines, the proposed method achieves a state-of-the-art MOTA score on the PoseTrack2018 validation dataset by reducing approximately 50% of tracking errors from a state-of-the art baseline method.