CVSep 22, 2018

Focus On What's Important: Self-Attention Model for Human Pose Estimation

arXiv:1809.08371v21 citations
AI Analysis

This addresses the problem of redundant regions in images for computer vision researchers, offering an incremental improvement with self-learning attention.

The paper tackles human pose estimation by proposing an attention convolutional neural network (ACNN) that learns to focus on important regions like joints while filtering out redundant areas, achieving state-of-the-art performance on the MPII benchmark.

Human pose estimation is an essential yet challenging task in computer vision. One of the reasons for this difficulty is that there are many redundant regions in the images. In this work, we proposed a convolutional network architecture combined with the novel attention model. We named it attention convolutional neural network (ACNN). ACNN learns to focus on specific regions of different input features. It's a multi-stage architecture. Early stages filtrate the "nothing-regions", such as background and redundant body parts. And then, they submit the important regions which contain the joints of the human body to the following stages to get a more accurate result. What's more, it does not require extra manual annotations and self-learning is one of our intentions. We separately trained the network because the attention learning task and the pose estimation task are not independent. State-of-the-art performance is obtained on the MPII benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes