CVMar 23, 2017

Saliency-guided video classification via adaptively weighted learning

arXiv:1703.08025v21 citations
AI Analysis

This addresses the problem of indiscriminate frame modeling in video classification for applications like surveillance or entertainment, though it is incremental.

The paper tackles video classification by modeling salient and non-salient areas separately with different networks and adaptively weighting them per class, achieving state-of-the-art results.

Video classification is productive in many practical applications, and the recent deep learning has greatly improved its accuracy. However, existing works often model video frames indiscriminately, but from the view of motion, video frames can be decomposed into salient and non-salient areas naturally. Salient and non-salient areas should be modeled with different networks, for the former present both appearance and motion information, and the latter present static background information. To address this problem, in this paper, video saliency is predicted by optical flow without supervision firstly. Then two streams of 3D CNN are trained individually for raw frames and optical flow on salient areas, and another 2D CNN is trained for raw frames on non-salient areas. For the reason that these three streams play different roles for each class, the weights of each stream are adaptively learned for each class. Experimental results show that saliency-guided modeling and adaptively weighted learning can reinforce each other, and we achieve the state-of-the-art results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes