CVLGMar 25, 2015

Initialization Strategies of Spatio-Temporal Convolutional Neural Networks

arXiv:1503.07274v17 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of efficiently learning temporal representations in video analysis for computer vision applications, but it is incremental as it builds on existing methods.

The paper tackled the problem of incorporating temporal information from videos into spatial convolutional neural networks without training spatio-temporal ConvNets from scratch, by proposing initialization strategies for 3D convolutional weights using pre-trained 2D weights from ImageNet, and demonstrated improvement on the UCF-101 dataset.

We propose a new way of incorporating temporal information present in videos into Spatial Convolutional Neural Networks (ConvNets) trained on images, that avoids training Spatio-Temporal ConvNets from scratch. We describe several initializations of weights in 3D Convolutional Layers of Spatio-Temporal ConvNet using 2D Convolutional Weights learned from ImageNet. We show that it is important to initialize 3D Convolutional Weights judiciously in order to learn temporal representations of videos. We evaluate our methods on the UCF-101 dataset and demonstrate improvement over Spatial ConvNets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes