CVAIJan 21, 2021

Bridging the gap between Human Action Recognition and Online Action Detection

arXiv:2101.08851v1
Originality Incremental advance
AI Analysis

This work addresses the problem of task-specific feature extraction for online action detection in computer vision, representing an incremental improvement over existing methods.

The paper tackles the suboptimal feature extraction in online action detection by introducing a teacher-student framework with a novel training strategy, achieving state-of-the-art results on NTU RGB+D and PKU MMD datasets without using future knowledge.

Action recognition, early prediction, and online action detection are complementary disciplines that are often studied independently. Most online action detection networks use a pre-trained feature extractor, which might not be optimal for its new task. We address the task-specific feature extraction with a teacher-student framework between the aforementioned disciplines, and a novel training strategy. Our network, Online Knowledge Distillation Action Detection network (OKDAD), embeds online early prediction and online temporal segment proposal subnetworks in parallel. Low interclass and high intraclass similarity are encouraged during teacher training. Knowledge distillation to the OKDAD network is ensured via layer reuse and cosine similarity between teacher-student feature vectors. Layer reuse and similarity learning significantly improve our baseline which uses a generic feature extractor. We evaluate our framework on infrared videos from two popular datasets, NTU RGB+D (action recognition, early prediction) and PKU MMD (action detection). Unlike previous attempts on those datasets, our student networks perform without any knowledge of the future. Even with this added difficulty, we achieve state-of-the-art results on both datasets. Moreover, our networks use infrared from RGB-D cameras, which we are the first to use for online action detection, to our knowledge.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes