Learning to detect video events from zero or very few video examples
This work addresses the challenge of video event detection for applications like surveillance or content analysis, but it is incremental as it builds on existing methods with specific adaptations.
The paper tackles the problem of detecting high-level events in video using only textual descriptions or very few positive examples, proposing a learning framework and an extended SVM method to incorporate related videos. Experimental results on the TRECVID MED 2014 dataset demonstrate the effectiveness of these approaches.
In this work we deal with the problem of high-level event detection in video. Specifically, we study the challenging problems of i) learning to detect video events from solely a textual description of the event, without using any positive video examples, and ii) additionally exploiting very few positive training samples together with a small number of ``related'' videos. For learning only from an event's textual description, we first identify a general learning framework and then study the impact of different design choices for various stages of this framework. For additionally learning from example videos, when true positive training samples are scarce, we employ an extension of the Support Vector Machine that allows us to exploit ``related'' event videos by automatically introducing different weights for subsets of the videos in the overall training set. Experimental evaluations performed on the large-scale TRECVID MED 2014 video dataset provide insight on the effectiveness of the proposed methods.