CVAug 13, 2020

Learning Temporally Invariant and Localizable Features via Data Augmentation for Video Recognition

arXiv:2008.05721v129 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the need for more efficient and robust video recognition models, particularly in data-scarce scenarios, though it is incremental as it builds on existing spatial augmentation methods.

The authors tackled the problem of improving video recognition by extending spatial data augmentation strategies to the temporal dimension, resulting in enhanced performance with limited training data, as demonstrated in the VIPriors challenge.

Deep-Learning-based video recognition has shown promising improvements along with the development of large-scale datasets and spatiotemporal network architectures. In image recognition, learning spatially invariant features is a key factor in improving recognition performance and robustness. Data augmentation based on visual inductive priors, such as cropping, flipping, rotating, or photometric jittering, is a representative approach to achieve these features. Recent state-of-the-art recognition solutions have relied on modern data augmentation strategies that exploit a mixture of augmentation operations. In this study, we extend these strategies to the temporal dimension for videos to learn temporally invariant or temporally localizable features to cover temporal perturbations or complex actions in videos. Based on our novel temporal data augmentation algorithms, video recognition performances are improved using only a limited amount of training data compared to the spatial-only data augmentation algorithms, including the 1st Visual Inductive Priors (VIPriors) for data-efficient action recognition challenge. Furthermore, learned features are temporally localizable that cannot be achieved using spatial augmentation algorithms. Our source code is available at https://github.com/taeoh-kim/temporal_data_augmentation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes