Minimal Filtering Algorithms for Convolutional Neural Networks
This work addresses hardware efficiency for CNN implementations, but it is incremental as it builds on existing Winograd techniques.
The paper tackled the problem of resource-efficient hardware implementation of filtering operations in convolutional neural networks by developing fully parallel algorithms using Winograd minimal filtering for various filter sizes, achieving approximately 30% savings in embedded multipliers compared to naive methods.
In this paper, we present several resource-efficient algorithmic solutions regarding the fully parallel hardware implementation of the basic filtering operation performed in the convolutional layers of convolution neural networks. In fact, these basic operations calculate two inner products of neighboring vectors formed by a sliding time window from the current data stream with an impulse response of the M-tap finite impulse response filter. We used Winograd minimal filtering trick and applied it to develop fully parallel hardware-oriented algorithms for implementing the basic filtering operation for M=3,5,7,9, and 11. A fully parallel hardware implementation of the proposed algorithms in each case gives approximately 30 percent savings in the number of embedded multipliers compared to a fully parallel hardware implementation of the naive calculation methods.