CVLGJul 13, 2022

Is Appearance Free Action Recognition Possible?

arXiv:2207.06261v122 citationsh-index: 65
Originality Incremental advance
AI Analysis

This addresses the bias in deep-learning video architectures toward static information, motivating better motion modeling for action recognition.

The paper tackled the problem of isolating dynamic information in video action recognition by creating the Appearance Free Dataset (AFD), which lacks static cues, and found that 11 contemporary architectures performed notably worse on AFD compared to RGB videos, while humans showed similar accuracy on both.

Intuition might suggest that motion and dynamic information are key to video-based action recognition. In contrast, there is evidence that state-of-the-art deep-learning video understanding architectures are biased toward static information available in single frames. Presently, a methodology and corresponding dataset to isolate the effects of dynamic information in video are missing. Their absence makes it difficult to understand how well contemporary architectures capitalize on dynamic vs. static information. We respond with a novel Appearance Free Dataset (AFD) for action recognition. AFD is devoid of static information relevant to action recognition in a single frame. Modeling of the dynamics is necessary for solving the task, as the action is only apparent through consideration of the temporal dimension. We evaluated 11 contemporary action recognition architectures on AFD as well as its related RGB video. Our results show a notable decrease in performance for all architectures on AFD compared to RGB. We also conducted a complimentary study with humans that shows their recognition accuracy on AFD and RGB is very similar and much better than the evaluated architectures on AFD. Our results motivate a novel architecture that revives explicit recovery of optical flow, within a contemporary design for best performance on AFD and RGB.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes