CVAug 26, 2020

A Prospective Study on Sequence-Driven Temporal Sampling and Ego-Motion Compensation for Action Recognition in the EPIC-Kitchens Dataset

arXiv:2008.11588v12 citations
Originality Incremental advance
AI Analysis

This work addresses action recognition for egocentric video analysis, but it is incremental as it builds on existing CNN-based methods with motion compensation.

The paper tackles the problem of action recognition in egocentric videos by compensating for ego-motion to improve temporal receptive fields, resulting in a method that achieves competitive performance on the EPIC-Kitchens dataset.

Action recognition is currently one of the top-challenging research fields in computer vision. Convolutional Neural Networks (CNNs) have significantly boosted its performance but rely on fixed-size spatio-temporal windows of analysis, reducing CNNs temporal receptive fields. Among action recognition datasets, egocentric recorded sequences have become of important relevance while entailing an additional challenge: ego-motion is unavoidably transferred to these sequences. The proposed method aims to cope with it by estimating this ego-motion or camera motion. The estimation is used to temporally partition video sequences into motion-compensated temporal \textit{chunks} showing the action under stable backgrounds and allowing for a content-driven temporal sampling. A CNN trained in an end-to-end fashion is used to extract temporal features from each \textit{chunk}, which are late fused. This process leads to the extraction of features from the whole temporal range of an action, increasing the temporal receptive field of the network.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes