CVMay 4, 2018

Object and Text-guided Semantics for CNN-based Activity Recognition

Sungmin Eum, Christopher Reale, Heesung Kwon, Claire Bonial, Clare Voss

arXiv:1805.01818v12.51 citations

Originality Incremental advance

AI Analysis

This work addresses activity recognition for video analysis by integrating object and text semantics, offering an incremental improvement over existing methods.

The paper tackles video-based human activity recognition by co-learning object recognition and using text-guided semantics to select relevant objects, improving baseline performance with a novel CNN approach.

Many previous methods have demonstrated the importance of considering semantically relevant objects for carrying out video-based human activity recognition, yet none of the methods have harvested the power of large text corpora to relate the objects and the activities to be transferred into learning a unified deep convolutional neural network. We present a novel activity recognition CNN which co-learns the object recognition task in an end-to-end multitask learning scheme to improve upon the baseline activity recognition performance. We further improve upon the multitask learning approach by exploiting a text-guided semantic space to select the most relevant objects with respect to the target activities. To the best of our knowledge, we are the first to investigate this approach.

View on arXiv PDF

Similar