LGITMay 5, 2021

Encoding Weights of Irregular Sparsity for Fixed-to-Fixed Model Compression

arXiv:2105.01869v24 citations
Originality Incremental advance
AI Analysis

This addresses the trade-off between compression and parallelism in neural network pruning, enabling higher pruning rates without sacrificing performance, which is incremental but practical for deployment.

The paper tackles the problem of irregular sparsity in fine-grained pruning degrading parallelism, proposing a fixed-to-fixed encoding scheme to store sparse neural networks in a regular structure, achieving almost the maximum compression ratio for Transformer and ResNet-50 models.

Even though fine-grained pruning techniques achieve a high compression ratio, conventional sparsity representations (such as CSR) associated with irregular sparsity degrade parallelism significantly. Practical pruning methods, thus, usually lower pruning rates (by structured pruning) to improve parallelism. In this paper, we study fixed-to-fixed (lossless) encoding architecture/algorithm to support fine-grained pruning methods such that sparse neural networks can be stored in a highly regular structure. We first estimate the maximum compression ratio of encoding-based compression using entropy. Then, as an effort to push the compression ratio to the theoretical maximum (by entropy), we propose a sequential fixed-to-fixed encoding scheme. We demonstrate that our proposed compression scheme achieves almost the maximum compression ratio for the Transformer and ResNet-50 pruned by various fine-grained pruning methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes