CVOct 17, 2020

A Grid-based Representation for Human Action Recognition

arXiv:2010.08841v23 citations
AI Analysis

This work addresses the problem of limited temporal fusion and irrelevant information in action recognition for computer vision applications, presenting an incremental improvement.

The paper tackles human action recognition in videos by proposing a grid-based representation that encodes discriminative appearance and pose features, achieving accurate recognition on benchmark datasets despite intra-class variations and occlusions.

Human action recognition (HAR) in videos is a fundamental research topic in computer vision. It consists mainly in understanding actions performed by humans based on a sequence of visual observations. In recent years, HAR have witnessed significant progress, especially with the emergence of deep learning models. However, most of existing approaches for action recognition rely on information that is not always relevant for this task, and are limited in the way they fuse the temporal information. In this paper, we propose a novel method for human action recognition that encodes efficiently the most discriminative appearance information of an action with explicit attention on representative pose features, into a new compact grid representation. Our GRAR (Grid-based Representation for Action Recognition) method is tested on several benchmark datasets demonstrating that our model can accurately recognize human actions, despite intra-class appearance variations and occlusion challenges.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes