CVDec 13, 2020

EfficientPose: Efficient Human Pose Estimation with Neural Architecture Search

arXiv:2012.07086v152 citations
AI Analysis

This work provides more efficient human pose estimation models, which is crucial for real-time multimedia applications on resource-constrained devices.

This paper addresses the challenge of efficient human pose estimation for resource-constrained devices by proposing an efficient framework with a customized backbone and head. Their smallest model achieves 88.1% PCKh@0.5 on MPII with only 0.65 GFLOPs, and their large model is competitive with state-of-the-art HRNet (9.5 GFLOPs) while using only 2 GFLOPs.

Human pose estimation from image and video is a vital task in many multimedia applications. Previous methods achieve great performance but rarely take efficiency into consideration, which makes it difficult to implement the networks on resource-constrained devices. Nowadays real-time multimedia applications call for more efficient models for better interactions. Moreover, most deep neural networks for pose estimation directly reuse the networks designed for image classification as the backbone, which are not yet optimized for the pose estimation task. In this paper, we propose an efficient framework targeted at human pose estimation including two parts, the efficient backbone and the efficient head. By implementing the differentiable neural architecture search method, we customize the backbone network design for pose estimation and reduce the computation cost with negligible accuracy degradation. For the efficient head, we slim the transposed convolutions and propose a spatial information correction module to promote the performance of the final prediction. In experiments, we evaluate our networks on the MPII and COCO datasets. Our smallest model has only 0.65 GFLOPs with 88.1% PCKh@0.5 on MPII and our large model has only 2 GFLOPs while its accuracy is competitive with the state-of-the-art large model, i.e., HRNet with 9.5 GFLOPs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes