Toward fast and accurate human pose estimation via soft-gated skip connections
This work addresses the need for faster and more accurate human pose estimation, which is crucial for applications like robotics and human-computer interaction, and it represents an incremental improvement over existing methods.
The paper tackles the problem of improving both accuracy and efficiency in human pose estimation by proposing gated skip connections and a hybrid HourGlass-U-Net architecture, achieving state-of-the-art results on MPII and LSP datasets with a 3x reduction in model size and no performance loss compared to the original HourGlass network.
This paper is on highly accurate and highly efficient human pose estimation. Recent works based on Fully Convolutional Networks (FCNs) have demonstrated excellent results for this difficult problem. While residual connections within FCNs have proved to be quintessential for achieving high accuracy, we re-analyze this design choice in the context of improving both the accuracy and the efficiency over the state-of-the-art. In particular, we make the following contributions: (a) We propose gated skip connections with per-channel learnable parameters to control the data flow for each channel within the module within the macro-module. (b) We introduce a hybrid network that combines the HourGlass and U-Net architectures which minimizes the number of identity connections within the network and increases the performance for the same parameter budget. Our model achieves state-of-the-art results on the MPII and LSP datasets. In addition, with a reduction of 3x in model size and complexity, we show no decrease in performance when compared to the original HourGlass network.