CVMar 20, 2016

Modelling Temporal Information Using Discrete Fourier Transform for Video Classification

arXiv:1603.06182v51.1

Originality Incremental advance

AI Analysis

This work addresses the challenge of temporal modeling in video classification for applications like emotion recognition and action recognition, representing an incremental improvement over existing methods.

The paper tackled the problem of modeling temporal information in video classification by proposing the use of Discrete Fourier Transform (DFT) features to capture characteristics accumulated over time, achieving state-of-the-art performance on the VideoEmotion-8 dataset and competitive results on UCF-101.

Recently, video classification attracts intensive research efforts. However, most existing works are based on framelevel visual features, which might fail to model the temporal information, e.g. characteristics accumulated along time. In order to capture video temporal information, we propose to analyse features in frequency domain transformed by discrete Fourier transform (DFT features). Frame-level features are firstly extract by a pre-trained deep convolutional neural network (CNN). Then, time domain features are transformed and interpolated into DFT features. CNN and DFT features are further encoded by using different pooling methods and fused for video classification. In this way, static image features extracted from a pre-trained deep CNN and temporal information represented by DFT features are jointly considered for video classification. We test our method for video emotion classification and action recognition. Experimental results demonstrate that combining DFT features can effectively capture temporal information and therefore improve the performance of both video emotion classification and action recognition. Our approach has achieved a state-of-the-art performance on the largest video emotion dataset (VideoEmotion-8 dataset) and competitive results on UCF-101.

View on arXiv PDF

Similar