CVJun 16, 2018

Two Stream Self-Supervised Learning for Action Recognition

arXiv:1806.07383v112 citations
Originality Incremental advance
AI Analysis

This work addresses action recognition for video analysis, presenting an incremental improvement in self-supervised learning methods.

The paper tackles action recognition by proposing a self-supervised approach using a two-stream architecture to learn spatial and temporal representations from video frames, validated on HMDB51, UCF101, and HDD datasets.

We present a self-supervised approach using spatio-temporal signals between video frames for action recognition. A two-stream architecture is leveraged to tangle spatial and temporal representation learning. Our task is formulated as both a sequence verification and spatio-temporal alignment tasks. The former task requires motion temporal structure understanding while the latter couples the learned motion with the spatial representation. The self-supervised pre-trained weights effectiveness is validated on the action recognition task. Quantitative evaluation shows the self-supervised approach competence on three datasets: HMDB51, UCF101, and Honda driving dataset (HDD). Further investigations to boost performance and generalize validity are still required.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes