CVApr 25, 2022

Temporal Relevance Analysis for Video Action Models

arXiv:2204.11929v11 citationsh-index: 79
Originality Synthesis-oriented
AI Analysis

This provides insights for researchers in video action recognition, though it is incremental as it analyzes existing methods rather than introducing new ones.

The paper tackled the problem of understanding temporal modeling in video action recognition by proposing a method to quantify temporal relationships using layer-wise relevance propagation, finding no strong correlation between temporal relevance and model performance and that models capture local but not long-range dependencies.

In this paper, we provide a deep analysis of temporal modeling for action recognition, an important but underexplored problem in the literature. We first propose a new approach to quantify the temporal relationships between frames captured by CNN-based action models based on layer-wise relevance propagation. We then conduct comprehensive experiments and in-depth analysis to provide a better understanding of how temporal modeling is affected by various factors such as dataset, network architecture, and input frames. With this, we further study some important questions for action recognition that lead to interesting findings. Our analysis shows that there is no strong correlation between temporal relevance and model performance; and action models tend to capture local temporal information, but less long-range dependencies. Our codes and models will be publicly available.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes