Algebraic Representations for Faster Predictions in Convolutional Neural Networks
This addresses efficiency issues for practitioners using deep CNNs in computer vision, though it appears incremental as it builds on existing skip connection techniques.
The paper tackles the computational overhead of deep convolutional neural networks with skip connections by showing that trained linear CNNs can be simplified into a single-layer model, reducing prediction-time computation, and introduces a method to train nonlinear models with gradually removed skip connections, demonstrated on ResNet architectures.
Convolutional neural networks (CNNs) are a popular choice of model for tasks in computer vision. When CNNs are made with many layers, resulting in a deep neural network, skip connections may be added to create an easier gradient optimization problem while retaining model expressiveness. In this paper, we show that arbitrarily complex, trained, linear CNNs with skip connections can be simplified into a single-layer model, resulting in greatly reduced computational requirements during prediction time. We also present a method for training nonlinear models with skip connections that are gradually removed throughout training, giving the benefits of skip connections without requiring computational overhead during during prediction time. These results are demonstrated with practical examples on Residual Networks (ResNet) architecture.