CVMay 4, 2018

Object and Text-guided Semantics for CNN-based Activity Recognition

arXiv:1805.01818v11 citations
Originality Incremental advance
AI Analysis

This work addresses activity recognition for video analysis by integrating object and text semantics, offering an incremental improvement over existing methods.

The paper tackles video-based human activity recognition by co-learning object recognition and using text-guided semantics to select relevant objects, improving baseline performance with a novel CNN approach.

Many previous methods have demonstrated the importance of considering semantically relevant objects for carrying out video-based human activity recognition, yet none of the methods have harvested the power of large text corpora to relate the objects and the activities to be transferred into learning a unified deep convolutional neural network. We present a novel activity recognition CNN which co-learns the object recognition task in an end-to-end multitask learning scheme to improve upon the baseline activity recognition performance. We further improve upon the multitask learning approach by exploiting a text-guided semantic space to select the most relevant objects with respect to the target activities. To the best of our knowledge, we are the first to investigate this approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes