CVAug 2, 2016

CUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2016

arXiv:1608.00797v1150 citations
Originality Synthesis-oriented
AI Analysis

This work addresses video classification for researchers and practitioners, but it is incremental as it builds on existing temporal segment networks.

The paper tackled the untrimmed video classification task in the ActivityNet Challenge 2016 by using an ensemble of deep models with techniques like ResNet, Inception V3, top-k and attention-weighted pooling, and audio CNN, achieving a classification accuracy of 93.23% mAP and securing first place.

This paper presents the method that underlies our submission to the untrimmed video classification task of ActivityNet Challenge 2016. We follow the basic pipeline of temporal segment networks and further raise the performance via a number of other techniques. Specifically, we use the latest deep model architecture, e.g., ResNet and Inception V3, and introduce new aggregation schemes (top-k and attention-weighted pooling). Additionally, we incorporate the audio as a complementary channel, extracting relevant information via a CNN applied to the spectrograms. With these techniques, we derive an ensemble of deep models, which, together, attains a high classification accuracy (mAP $93.23\%$) on the testing set and secured the first place in the challenge.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes