Depthwise-STFT based separable Convolutional Neural Networks
This work addresses image classification efficiency and accuracy for researchers and practitioners, but it is incremental as it builds upon existing depthwise separable convolution methods.
The authors tackled the problem of improving convolutional neural networks for image classification by proposing a Depthwise-STFT Separable layer as an alternative to standard depthwise separable layers, resulting in better performance on CIFAR-10 and CIFAR-100 datasets with reduced space-time complexity.
In this paper, we propose a new convolutional layer called Depthwise-STFT Separable layer that can serve as an alternative to the standard depthwise separable convolutional layer. The construction of the proposed layer is inspired by the fact that the Fourier coefficients can accurately represent important features such as edges in an image. It utilizes the Fourier coefficients computed (channelwise) in the 2D local neighborhood (e.g., 3x3) of each position of the input map to obtain the feature maps. The Fourier coefficients are computed using 2D Short Term Fourier Transform (STFT) at multiple fixed low frequency points in the 2D local neighborhood at each position. These feature maps at different frequency points are then linearly combined using trainable pointwise (1x1) convolutions. We show that the proposed layer outperforms the standard depthwise separable layer-based models on the CIFAR-10 and CIFAR-100 image classification datasets with reduced space-time complexity.