Residual Squeeze VGG16
This work addresses the need for efficient network architectures in computer vision, though it appears incremental as it combines existing methods like Fire Modules and residual connections.
The authors tackled the problem of designing deep neural networks that are small in size and have short training times by proposing Residual-Squeeze-VGG16, which achieved similar accuracy to VGG16 while reducing training time by 23.86% and size by 88.4% on the MIT Places365-Standard dataset.
Deep learning has given way to a new era of machine learning, apart from computer vision. Convolutional neural networks have been implemented in image classification, segmentation and object detection. Despite recent advancements, we are still in the very early stages and have yet to settle on best practices for network architecture in terms of deep design, small in size and a short training time. In this work, we propose a very deep neural network comprised of 16 Convolutional layers compressed with the Fire Module adapted from the SQUEEZENET model. We also call for the addition of residual connections to help suppress degradation. This model can be implemented on almost every neural network model with fully incorporated residual learning. This proposed model Residual-Squeeze-VGG16 (ResSquVGG16) trained on the large-scale MIT Places365-Standard scene dataset. In our tests, the model performed with accuracy similar to the pre-trained VGG16 model in Top-1 and Top-5 validation accuracy while also enjoying a 23.86% reduction in training time and an 88.4% reduction in size. In our tests, this model was trained from scratch.