LG PFJan 25, 2022

Bit-serial Weight Pools: Compression and Arbitrary Precision Execution of Neural Networks on Resource Constrained Processors

arXiv:2201.11651v13.36 citations

Originality Highly original

AI Analysis

This addresses efficient neural network deployment on edge systems, offering a novel compression and acceleration method for microcontrollers.

The paper tackles the problem of deploying large neural networks on resource-constrained microcontrollers by proposing bit-serial weight pools, achieving up to 8x compression and over 2.8x speedup with less than 1% accuracy drop.

Applications of neural networks on edge systems have proliferated in recent years but the ever-increasing model size makes neural networks not able to deploy on resource-constrained microcontrollers efficiently. We propose bit-serial weight pools, an end-to-end framework that includes network compression and acceleration of arbitrary sub-byte precision. The framework can achieve up to 8x compression compared to 8-bit networks by sharing a pool of weights across the entire network. We further propose a bit-serial lookup based software implementation that allows runtime-bitwidth tradeoff and is able to achieve more than 2.8x speedup and 7.5x storage compression compared to 8-bit weight pool networks, with less than 1% accuracy drop.

View on arXiv PDF

Similar