PCAS: Pruning Channels with Attention Statistics for Deep Network Compression
This work addresses the challenge of implementing deep neural networks on small embedded devices by improving channel-pruning, though it is incremental as it builds on existing compression techniques.
The authors tackled the problem of manually setting compression ratios in each layer for channel-pruning in deep neural networks by proposing a simple technique based on attention statistics with automatic channel selection using a single compression ratio for the entire model. The method achieved superior performance over conventional methods in accuracy and computational costs across various models and datasets.
Compression techniques for deep neural networks are important for implementing them on small embedded devices. In particular, channel-pruning is a useful technique for realizing compact networks. However, many conventional methods require manual setting of compression ratios in each layer. It is difficult to analyze the relationships between all layers, especially for deeper models. To address these issues, we propose a simple channel-pruning technique based on attention statistics that enables to evaluate the importance of channels. We improved the method by means of a criterion for automatic channel selection, using a single compression ratio for the entire model in place of per-layer model analysis. The proposed approach achieved superior performance over conventional methods with respect to accuracy and the computational costs for various models and datasets. We provide analysis results for behavior of the proposed criterion on different datasets to demonstrate its favorable properties for channel pruning.