CVIVSep 6, 2023

Bandwidth-efficient Inference for Neural Image Compression

arXiv:2309.02855v21 citationsh-index: 36
Originality Incremental advance
AI Analysis

This addresses bandwidth and energy efficiency for neural network deployment on resource-constrained mobile/edge devices, representing an incremental improvement in optimization techniques.

The paper tackles the problem of limited communication bandwidth and power constraints for neural network inference on mobile/edge devices by proposing an end-to-end differentiable bandwidth-efficient inference method with activation compression. The result is up to 19x bandwidth reduction and 6.21x energy saving for low-level image compression tasks.

With neural networks growing deeper and feature maps growing larger, limited communication bandwidth with external memory (or DRAM) and power constraints become a bottleneck in implementing network inference on mobile and edge devices. In this paper, we propose an end-to-end differentiable bandwidth efficient neural inference method with the activation compressed by neural data compression method. Specifically, we propose a transform-quantization-entropy coding pipeline for activation compression with symmetric exponential Golomb coding and a data-dependent Gaussian entropy model for arithmetic coding. Optimized with existing model quantization methods, low-level task of image compression can achieve up to 19x bandwidth reduction with 6.21x energy saving.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes