Learning to Localize Temporal Events in Large-scale Video Data
This work addresses video search applications by localizing events in videos, but it is incremental as it builds on existing datasets and competition frameworks.
The paper tackled temporal event localization in large-scale video data using the Youtube-8M Segments dataset, achieving 5th place in the 3rd Youtube-8M video recognition challenge with a combination of gradient boosted decision trees and deep learning models.
We address temporal localization of events in large-scale video data, in the context of the Youtube-8M Segments dataset. This emerging field within video recognition can enable applications to identify the precise time a specified event occurs in a video, which has broad implications for video search. To address this we present two separate approaches: (1) a gradient boosted decision tree model on a crafted dataset and (2) a combination of deep learning models based on frame-level data, video-level data, and a localization model. The combinations of these two approaches achieved 5th place in the 3rd Youtube-8M video recognition challenge.