Dite-HRNet: Dynamic Lightweight High-Resolution Network for Human Pose Estimation
This work addresses computational efficiency and accuracy issues in human pose estimation for computer vision applications, representing an incremental improvement over existing lightweight networks.
The paper tackles the limitations of high-resolution networks in human pose estimation, such as high computational complexity and inability to capture long-range joint interactions, by proposing Dite-HRNet, which achieves superior performance on COCO and MPII datasets, surpassing state-of-the-art lightweight networks.
A high-resolution network exhibits remarkable capability in extracting multi-scale features for human pose estimation, but fails to capture long-range interactions between joints and has high computational complexity. To address these problems, we present a Dynamic lightweight High-Resolution Network (Dite-HRNet), which can efficiently extract multi-scale contextual information and model long-range spatial dependency for human pose estimation. Specifically, we propose two methods, dynamic split convolution and adaptive context modeling, and embed them into two novel lightweight blocks, which are named dynamic multi-scale context block and dynamic global context block. These two blocks, as the basic component units of our Dite-HRNet, are specially designed for the high-resolution networks to make full use of the parallel multi-resolution architecture. Experimental results show that the proposed network achieves superior performance on both COCO and MPII human pose estimation datasets, surpassing the state-of-the-art lightweight networks. Code is available at: https://github.com/ZiyiZhang27/Dite-HRNet.