CVMar 4, 2023

DistilPose: Tokenized Pose Regression with Heatmap Distillation

arXiv:2303.02455v338 citationsh-index: 39
AI Analysis

This work addresses a key bottleneck in human pose estimation for applications requiring real-time efficiency, offering a novel hybrid approach that is incremental but with strong specific gains.

The paper tackles the problem of combining the speed of regression-based methods with the performance of heatmap-based methods in human pose estimation by proposing DistilPose, which uses tokenization and simulated heatmaps for knowledge distillation, resulting in DistilPose-L achieving 74.4% mAP on MSCOCO validation dataset, a new state-of-the-art for regression-based models.

In the field of human pose estimation, regression-based methods have been dominated in terms of speed, while heatmap-based methods are far ahead in terms of performance. How to take advantage of both schemes remains a challenging problem. In this paper, we propose a novel human pose estimation framework termed DistilPose, which bridges the gaps between heatmap-based and regression-based methods. Specifically, DistilPose maximizes the transfer of knowledge from the teacher model (heatmap-based) to the student model (regression-based) through Token-distilling Encoder (TDE) and Simulated Heatmaps. TDE aligns the feature spaces of heatmap-based and regression-based models by introducing tokenization, while Simulated Heatmaps transfer explicit guidance (distribution and confidence) from teacher heatmaps into student models. Extensive experiments show that the proposed DistilPose can significantly improve the performance of the regression-based models while maintaining efficiency. Specifically, on the MSCOCO validation dataset, DistilPose-S obtains 71.6% mAP with 5.36M parameter, 2.38 GFLOPs and 40.2 FPS, which saves 12.95x, 7.16x computational cost and is 4.9x faster than its teacher model with only 0.9 points performance drop. Furthermore, DistilPose-L obtains 74.4% mAP on MSCOCO validation dataset, achieving a new state-of-the-art among predominant regression-based models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes