CV IVSep 6, 2023

Bandwidth-efficient Inference for Neural Image Compression

Shanzhi Yin, Tongda Xu, Yongsheng Liang, Yuanyuan Wang, Yanghao Li, Yan Wang, Jingjing Liu

arXiv:2309.02855v22.81 citationsh-index: 36

Originality Incremental advance

AI Analysis

This addresses bandwidth and energy efficiency for neural network deployment on resource-constrained mobile/edge devices, representing an incremental improvement in optimization techniques.

The paper tackles the problem of limited communication bandwidth and power constraints for neural network inference on mobile/edge devices by proposing an end-to-end differentiable bandwidth-efficient inference method with activation compression. The result is up to 19x bandwidth reduction and 6.21x energy saving for low-level image compression tasks.

With neural networks growing deeper and feature maps growing larger, limited communication bandwidth with external memory (or DRAM) and power constraints become a bottleneck in implementing network inference on mobile and edge devices. In this paper, we propose an end-to-end differentiable bandwidth efficient neural inference method with the activation compressed by neural data compression method. Specifically, we propose a transform-quantization-entropy coding pipeline for activation compression with symmetric exponential Golomb coding and a data-dependent Gaussian entropy model for arithmetic coding. Optimized with existing model quantization methods, low-level task of image compression can achieve up to 19x bandwidth reduction with 6.21x energy saving.

View on arXiv PDF

Similar