CVJul 22, 2021

Adaptive Dilated Convolution For Human Pose Estimation

arXiv:2107.10477v1
Originality Incremental advance
AI Analysis

This addresses a domain-specific issue in human pose estimation by improving generalization to various human sizes, though it is incremental as it builds on existing methods.

The paper tackles the problem of multi-scale feature misalignment and inflexibility in human pose estimation by proposing an adaptive dilated convolution (ADC) that generates and fuses multi-scale features with the same spatial size using learnable dilation rates, resulting in consistent improvements across various methods.

Most existing human pose estimation (HPE) methods exploit multi-scale information by fusing feature maps of four different spatial sizes, \ie $1/4$, $1/8$, $1/16$, and $1/32$ of the input image. There are two drawbacks of this strategy: 1) feature maps of different spatial sizes may be not well aligned spatially, which potentially hurts the accuracy of keypoint location; 2) these scales are fixed and inflexible, which may restrict the generalization ability over various human sizes. Towards these issues, we propose an adaptive dilated convolution (ADC). It can generate and fuse multi-scale features of the same spatial sizes by setting different dilation rates for different channels. More importantly, these dilation rates are generated by a regression module. It enables ADC to adaptively adjust the fused scales and thus ADC may generalize better to various human sizes. ADC can be end-to-end trained and easily plugged into existing methods. Extensive experiments show that ADC can bring consistent improvements to various HPE methods. The source codes will be released for further research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes