Learning Instance-wise Sparsity for Accelerating Deep Models
This work addresses the need for low-memory and high-efficiency deep models for various machine learning tasks, offering an incremental improvement over existing parameter-based acceleration methods.
The paper tackles the problem of accelerating deep convolutional neural networks by developing an instance-wise feature pruning method that identifies and eliminates subtle features for different input instances, achieving efficient inference while preserving overall network performance.
Exploring deep convolutional neural networks of high efficiency and low memory usage is very essential for a wide variety of machine learning tasks. Most of existing approaches used to accelerate deep models by manipulating parameters or filters without data, e.g., pruning and decomposition. In contrast, we study this problem from a different perspective by respecting the difference between data. An instance-wise feature pruning is developed by identifying informative features for different instances. Specifically, by investigating a feature decay regularization, we expect intermediate feature maps of each instance in deep neural networks to be sparse while preserving the overall network performance. During online inference, subtle features of input images extracted by intermediate layers of a well-trained neural network can be eliminated to accelerate the subsequent calculations. We further take coefficient of variation as a measure to select the layers that are appropriate for acceleration. Extensive experiments conducted on benchmark datasets and networks demonstrate the effectiveness of the proposed method.