CVSep 7, 2019

Exploring Temporal Differences in 3D Convolutional Neural Networks

arXiv:1909.03309v13 citations
Originality Incremental advance
AI Analysis

This addresses the problem of efficient spatio-temporal modeling in video and 3D data analysis, offering a parameter-efficient alternative to 3D CNNs, though it appears incremental as it builds on existing 2D and 3D CNN approaches.

The paper tackles the computational expense and overfitting issues of traditional 3D CNNs by proposing a convolutional block that uses 2D convolutions for spatial information and temporal differences for temporal information, reducing parameters by n times compared to 3D convolutions and showing improved performance on UCF101 and ModelNet datasets.

Traditional 3D convolutions are computationally expensive, memory intensive, and due to large number of parameters, they often tend to overfit. On the other hand, 2D CNNs are less computationally expensive and less memory intensive than 3D CNNs and have shown remarkable results in applications like image classification and object recognition. However, in previous works, it has been observed that they are inferior to 3D CNNs when applied on a spatio-temporal input. In this work, we propose a convolutional block which extracts the spatial information by performing a 2D convolution and extracts the temporal information by exploiting temporal differences, i.e., the change in the spatial information at different time instances, using simple operations of shift, subtract and add without utilizing any trainable parameters. The proposed convolutional block has same number of parameters as of a 2D convolution kernel of size nxn, i.e. n^2, and has n times lesser parameters than an nxnxn 3D convolution kernel. We show that the 3D CNNs perform better when the 3D convolution kernels are replaced by the proposed convolutional blocks. We evaluate the proposed convolutional block on UCF101 and ModelNet datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes