CVMar 7, 2016

A novel learning-based frame pooling method for Event Detection

arXiv:1603.02078v21 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of video event detection for researchers and practitioners working with large video collections, though it appears incremental as it builds upon existing pooling strategies.

The paper tackles the problem of pooling multiple frame features into a single representation for video event detection by proposing a learning-based frame pooling method that automatically learns optimal pooling weights for each event category. Experimental results on TRECVID MED 2011 show it outperforms average and max pooling strategies on both high-level and low-level 2D image features.

Detecting complex events in a large video collection crawled from video websites is a challenging task. When applying directly good image-based feature representation, e.g., HOG, SIFT, to videos, we have to face the problem of how to pool multiple frame feature representations into one feature representation. In this paper, we propose a novel learning-based frame pooling method. We formulate the pooling weight learning as an optimization problem and thus our method can automatically learn the best pooling weight configuration for each specific event category. Experimental results conducted on TRECVID MED 2011 reveal that our method outperforms the commonly used average pooling and max pooling strategies on both high-level and low-level 2D image features.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes