Parsimonious Inference on Convolutional Neural Networks: Learning and applying on-line kernel activation rules
This addresses the need for efficient CNN deployment on resource-constrained devices like mobile platforms, representing a novel method rather than an incremental improvement.
The paper tackles the problem of reducing computational load during CNN inference by introducing a new structural element that allows CNNs to change size and form in real-time, achieving up to 3x speed-up on mobile platforms while preserving or improving accuracy.
A new, radical CNN design approach is presented in this paper, considering the reduction of the total computational load during inference. This is achieved by a new holistic intervention on both the CNN architecture and the training procedure, which targets to the parsimonious inference by learning to exploit or remove the redundant capacity of a CNN architecture. This is accomplished, by the introduction of a new structural element that can be inserted as an add-on to any contemporary CNN architecture, whilst preserving or even improving its recognition accuracy. Our approach formulates a systematic and data-driven method for developing CNNs that are trained to eventually change size and form in real-time during inference, targeting to the smaller possible computational footprint. Results are provided for the optimal implementation on a few modern, high-end mobile computing platforms indicating a significant speed-up of up to x3 times.