CVLGMLFeb 19, 2020

Human Action Recognition using Local Two-Stream Convolution Neural Network Features and Support Vector Machines

arXiv:2002.09423v1
AI Analysis

This work addresses action recognition for video analysis, but it is incremental as it builds on existing methods with minor enhancements.

The paper tackles human action recognition in video by extracting local appearance and motion features with 3D CNNs and using a linear SVM for classification, achieving improved performance through preprocessing steps like optical flow scaling and crop filling.

This paper proposes a simple yet effective method for human action recognition in video. The proposed method separately extracts local appearance and motion features using state-of-the-art three-dimensional convolutional neural networks from sampled snippets of a video. These local features are then concatenated to form global representations which are then used to train a linear SVM to perform the action classification using full context of the video, as partial context as used in previous works. The videos undergo two simple proposed preprocessing techniques, optical flow scaling and crop filling. We perform an extensive evaluation on three common benchmark dataset to empirically show the benefit of the SVM, and the two preprocessing steps.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes