CVJul 12, 2021

Human-like Relational Models for Activity Recognition in Video

arXiv:2107.05319v21 citations
AI Analysis

This addresses the challenge of improving activity recognition in videos for applications like surveillance or robotics, but it is incremental as it builds on existing relational models.

The paper tackled the problem of video activity recognition by proposing a human-like approach that interprets videos in temporal phases and extracts object-hand relationships, achieving more robust performance on challenging activities compared to neural network baselines on a subset of the something-something dataset.

Video activity recognition by deep neural networks is impressive for many classes. However, it falls short of human performance, especially for challenging to discriminate activities. Humans differentiate these complex activities by recognising critical spatio-temporal relations among explicitly recognised objects and parts, for example, an object entering the aperture of a container. Deep neural networks can struggle to learn such critical relationships effectively. Therefore we propose a more human-like approach to activity recognition, which interprets a video in sequential temporal phases and extracts specific relationships among objects and hands in those phases. Random forest classifiers are learnt from these extracted relationships. We apply the method to a challenging subset of the something-something dataset and achieve a more robust performance against neural network baselines on challenging activities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes