CVLGJul 25, 2019

Submission to ActivityNet Challenge 2019: Task B Spatio-temporal Action Localization

arXiv:1907.10837v1
Originality Incremental advance
AI Analysis

This work addresses action localization in videos for computer vision applications, presenting incremental improvements over existing methods.

The authors tackled spatio-temporal action localization by proposing an end-to-end trainable architecture using only RGB sequential images, achieving improved performance through data augmentation and label subsampling methods.

This technical report present an overview of our system proposed for the spatio-temporal action localization(SAL) task in ActivityNet Challenge 2019. Unlike previous two-streams-based works, we focus on exploring the end-to-end trainable architecture using only RGB sequential images. To this end, we employ a previously proposed simple yet effective two-branches network called SlowFast Networks which is capable of capturing both short- and long-term spatiotemporal features. Moreover, to handle the severe class imbalance and overfitting problems, we propose a correlation-preserving data augmentation method and a random label subsampling method which have been proven to be able to reduce overfitting and improve the performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes