About Pyramid Structure in Convolutional Neural Networks
This work addresses the need for more efficient CNN designs for computer vision applications, though it appears incremental as it builds on existing models with structural modifications.
The paper tackles the problem of reducing learnable parameters in convolutional neural networks without sacrificing performance by investigating the use of pyramid structures inspired by biological neurons. It achieves over 80% parameter reduction in Caffe_LENET and 10-40% reduction in AlexNet models while maintaining competitive results on datasets like MNIST, Cifar-10, Cifar-100, and ImageNet-12.
Deep convolutional neural networks (CNN) brought revolution without any doubt to various challenging tasks, mainly in computer vision. However, their model designing still requires attention to reduce number of learnable parameters, with no meaningful reduction in performance. In this paper we investigate to what extend CNN may take advantage of pyramid structure typical of biological neurons. A generalized statement over convolutional layers from input till fully connected layer is introduced that helps further in understanding and designing a successful deep network. It reduces ambiguity, number of parameters, and their size on disk without degrading overall accuracy. Performance are shown on state-of-the-art models for MNIST, Cifar-10, Cifar-100, and ImageNet-12 datasets. Despite more than 80% reduction in parameters for Caffe_LENET, challenging results are obtained. Further, despite 10-20% reduction in training data along with 10-40% reduction in parameters for AlexNet model and its variations, competitive results are achieved when compared to similar well-engineered deeper architectures.