LGMar 7, 2022

Dynamic ConvNets on Tiny Devices via Nested Sparsity

Matteo Grimaldi, Luca Mocerino, Antonio Cipolletta, Andrea Calimera

arXiv:2203.03324v18.78 citationsh-index: 21

Originality Highly original

AI Analysis

This work addresses efficient inference for IoT edge devices, offering a novel compression and training pipeline that is incremental in improving dynamic model deployment.

The paper tackles the problem of deploying dynamic ConvNets on resource-constrained edge devices by introducing Nested Sparse ConvNets, which achieve comparable accuracy, remarkable storage savings, and high performance on an ARM-M7 MCU, outperforming naive variable-latency solutions and being Pareto optimal in accuracy vs. latency compared to state-of-the-art dynamic strategies.

This work introduces a new training and compression pipeline to build Nested Sparse ConvNets, a class of dynamic Convolutional Neural Networks (ConvNets) suited for inference tasks deployed on resource-constrained devices at the edge of the Internet-of-Things. A Nested Sparse ConvNet consists of a single ConvNet architecture containing N sparse sub-networks with nested weights subsets, like a Matryoshka doll, and can trade accuracy for latency at run time, using the model sparsity as a dynamic knob. To attain high accuracy at training time, we propose a gradient masking technique that optimally routes the learning signals across the nested weights subsets. To minimize the storage footprint and efficiently process the obtained models at inference time, we introduce a new sparse matrix compression format with dedicated compute kernels that fruitfully exploit the characteristic of the nested weights subsets. Tested on image classification and object detection tasks on an off-the-shelf ARM-M7 Micro Controller Unit (MCU), Nested Sparse ConvNets outperform variable-latency solutions naively built assembling single sparse models trained as stand-alone instances, achieving (i) comparable accuracy, (ii) remarkable storage savings, and (iii) high performance. Moreover, when compared to state-of-the-art dynamic strategies, like dynamic pruning and layer width scaling, Nested Sparse ConvNets turn out to be Pareto optimal in the accuracy vs. latency space.

View on arXiv PDF

Similar