CVNov 4, 2017

Attentional Pooling for Action Recognition

arXiv:1711.01467v3334 citations
Originality Highly original
AI Analysis

This addresses action recognition for computer vision applications, offering a simple yet effective attention mechanism that improves state-of-the-art performance.

The paper tackles action recognition by introducing an attention module that can be trained with or without extra supervision, achieving a 12.5% relative improvement on the MPII dataset and boosting accuracy across three benchmarks while maintaining similar network size and computational cost.

We introduce a simple yet surprisingly powerful model to incorporate attention in action recognition and human object interaction tasks. Our proposed attention module can be trained with or without extra supervision, and gives a sizable boost in accuracy while keeping the network size and computational cost nearly the same. It leads to significant improvements over state of the art base architecture on three standard action recognition benchmarks across still images and videos, and establishes new state of the art on MPII dataset with 12.5% relative improvement. We also perform an extensive analysis of our attention module both empirically and analytically. In terms of the latter, we introduce a novel derivation of bottom-up and top-down attention as low-rank approximations of bilinear pooling methods (typically used for fine-grained classification). From this perspective, our attention formulation suggests a novel characterization of action recognition as a fine-grained recognition problem.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes