CVOct 20, 2021

Self-Supervision and Spatial-Sequential Attention Based Loss for Multi-Person Pose Estimation

arXiv:2110.10734v1
Originality Incremental advance
AI Analysis

This work addresses performance issues in multi-person pose estimation for computer vision applications, representing an incremental improvement over existing methods.

The paper tackles the problem of low feature utilization and prediction contradictions in bottom-up multi-person pose estimation by proposing a new loss organization method and a combination of predictions, resulting in a 5.5% mAP improvement over the OpenPose baseline on the COCO verification dataset.

Bottom-up based multi-person pose estimation approaches use heatmaps with auxiliary predictions to estimate joint positions and belonging at one time. Recently, various combinations between auxiliary predictions and heatmaps have been proposed for higher performance, these predictions are supervised by the corresponding L2 loss function directly. However, the lack of more explicit supervision results in low features utilization and contradictions between predictions in one model. To solve these problems, this paper proposes (i) a new loss organization method which uses self-supervised heatmaps to reduce prediction contradictions and spatial-sequential attention to enhance networks' features extraction; (ii) a new combination of predictions composed by heatmaps, Part Affinity Fields (PAFs) and our block-inside offsets to fix pixel-level joints positions and further demonstrates the effectiveness of proposed loss function. Experiments are conducted on the MS COCO keypoint dataset and adopting OpenPose as the baseline model. Our method outperforms the baseline overall. On the COCO verification dataset, the mAP of OpenPose trained with our proposals outperforms the OpenPose baseline by over 5.5%.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes