CVOct 20, 2017

Generalized Zero-Shot Learning for Action Recognition with Web-Scale Video Data

arXiv:1710.07455v140 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of recognizing rare or unseen actions in surveillance videos, which is crucial for public safety, though it is incremental as it adapts zero-shot learning to a more realistic setting.

The paper tackles the problem of action recognition in surveillance videos under a generalized zero-shot setting, where testing includes both seen and unseen classes, and proposes a method that transfers knowledge from web-scale video data to detect anomalous actions, achieving results on a new dataset of nine public safety-related action classes.

Action recognition in surveillance video makes our life safer by detecting the criminal events or predicting violent emergencies. However, efficient action recognition is not free of difficulty. First, there are so many action classes in daily life that we cannot pre-define all possible action classes beforehand. Moreover, it is very hard to collect real-word videos for certain particular actions such as steal and street fight due to legal restrictions and privacy protection. These challenges make existing data-driven recognition methods insufficient to attain desired performance. Zero-shot learning is potential to be applied to solve these issues since it can perform classification without positive example. Nevertheless, current zero-shot learning algorithms have been studied under the unreasonable setting where seen classes are absent during the testing phase. Motivated by this, we study the task of action recognition in surveillance video under a more realistic \emph{generalized zero-shot setting}, where testing data contains both seen and unseen classes. To our best knowledge, this is the first work to study video action recognition under the generalized zero-shot setting. We firstly perform extensive empirical studies on several existing zero-shot leaning approaches under this new setting on a web-scale video data. Our experimental results demonstrate that, under the generalize setting, typical zero-shot learning methods are no longer effective for the dataset we applied. Then, we propose a method for action recognition by deploying generalized zero-shot learning, which transfers the knowledge of web video to detect the anomalous actions in surveillance videos. To verify the effectiveness of our proposed method, we further construct a new surveillance video dataset consisting of nine action classes related to the public safety situation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes