CVAIDec 17, 2022

FSCNN: A Fast Sparse Convolution Neural Network Inference System

arXiv:2212.08815v13 citationsh-index: 8
Originality Incremental advance
AI Analysis

This work addresses model compression for CNN inference, but it is incremental as it highlights limitations and recommends structured sparsity instead.

The authors tackled the problem of high computational cost and redundant parameters in CNNs by developing FSCNN, a fast inference system that utilizes fine-grained sparsity, achieving speedups over PyTorch on models like VGG16 when sparsity is high, though it underperforms dense operators due to contiguity issues.

Convolution neural networks (CNNs) have achieved remarkable success, but typically accompany high computation cost and numerous redundant weight parameters. To reduce the FLOPs, structure pruning is a popular approach to remove the entire hidden structures via introducing coarse-grained sparsity. Meanwhile, plentiful pruning works leverage fine-grained sparsity instead (sparsity are randomly distributed), whereas their sparse models lack special designed computing library for potential speedup. In this technical report, we study and present an efficient convolution neural network inference system to accelerate its forward pass by utilizing the fine-grained sparsity of compressed CNNs. Our developed FSCNN is established based on a set of specialized designed sparse data structures, operators and associated algorithms. Experimentally, we validate that FSCNN outperforms standard deep learning library PyTorch on popular CNN architectures such as VGG16 if sufficiently high sparsity exhibits. However, due to the contiguity issue of sparse operators, FSCNN is typically not comparable with highly optimized dense operator. Therefore, coarse-grained (structured) sparsity is our recommendation for generic model compression.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes