CVFeb 17, 2015

SA-CNN: Dynamic Scene Classification using Convolutional Neural Networks

arXiv:1502.05243v21 citations
AI Analysis

This addresses video classification challenges for computer vision applications, though it appears incremental as it combines existing aggregation methods with pre-trained CNNs.

The paper tackles dynamic scene classification in videos with moving cameras by applying statistical aggregation techniques to CNN activation features from multiple frames. The approach achieves state-of-the-art performance on Maryland and YUPenn datasets, effectively handling complex camera motion and scene dynamics.

The task of classifying videos of natural dynamic scenes into appropriate classes has gained lot of attention in recent years. The problem especially becomes challenging when the camera used to capture the video is dynamic. In this paper, we analyse the performance of statistical aggregation (SA) techniques on various pre-trained convolutional neural network(CNN) models to address this problem. The proposed approach works by extracting CNN activation features for a number of frames in a video and then uses an aggregation scheme in order to obtain a robust feature descriptor for the video. We show through results that the proposed approach performs better than the-state-of-the arts for the Maryland and YUPenn dataset. The final descriptor obtained is powerful enough to distinguish among dynamic scenes and is even capable of addressing the scenario where the camera motion is dominant and the scene dynamics are complex. Further, this paper shows an extensive study on the performance of various aggregation methods and their combinations. We compare the proposed approach with other dynamic scene classification algorithms on two publicly available datasets - Maryland and YUPenn to demonstrate the superior performance of the proposed approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes