Constructing Fast Network through Deconstruction of Convolution
This work addresses the need for efficient neural networks in resource-limited environments like mobile applications, offering a novel method to reduce computational costs while maintaining performance.
The paper tackles the problem of heavy resource costs in convolutional neural networks for vision tasks by deconstructing naive convolution into a shift operation and pointwise convolution, proposing an active shift layer (ASL) that learns shift parameters end-to-end, and applying it to a network that surpasses existing state-of-the-art networks in speed and efficiency.
Convolutional neural networks have achieved great success in various vision tasks; however, they incur heavy resource costs. By using deeper and wider networks, network accuracy can be improved rapidly. However, in an environment with limited resources (e.g., mobile applications), heavy networks may not be usable. This study shows that naive convolution can be deconstructed into a shift operation and pointwise convolution. To cope with various convolutions, we propose a new shift operation called active shift layer (ASL) that formulates the amount of shift as a learnable function with shift parameters. This new layer can be optimized end-to-end through backpropagation and it can provide optimal shift values. Finally, we apply this layer to a light and fast network that surpasses existing state-of-the-art networks.