LGOct 11, 2023

Enhancing Neural Architecture Search with Multiple Hardware Constraints for Deep Learning Model Deployment on Tiny IoT Devices

Alessio Burrello, Matteo Risso, Beatrice Alessandra Motetti, Enrico Macii, Luca Benini, Daniele Jahier Pagliari

arXiv:2310.07217v17.718 citationsh-index: 22Has Code

Originality Highly original

AI Analysis

This work addresses the problem of efficient model deployment for IoT developers by enabling single-shot generation of models that meet specific memory and latency constraints, representing a significant improvement over iterative NAS methods.

The paper tackles the challenge of deploying deep learning models on tiny IoT devices by proposing a novel Neural Architecture Search (NAS) approach that incorporates multiple hardware constraints, achieving reductions in memory by 87.4% and latency by 54.2% while maintaining non-inferior accuracy compared to state-of-the-art hand-tuned models.

The rapid proliferation of computing domains relying on Internet of Things (IoT) devices has created a pressing need for efficient and accurate deep-learning (DL) models that can run on low-power devices. However, traditional DL models tend to be too complex and computationally intensive for typical IoT end-nodes. To address this challenge, Neural Architecture Search (NAS) has emerged as a popular design automation technique for co-optimizing the accuracy and complexity of deep neural networks. Nevertheless, existing NAS techniques require many iterations to produce a network that adheres to specific hardware constraints, such as the maximum memory available on the hardware or the maximum latency allowed by the target application. In this work, we propose a novel approach to incorporate multiple constraints into so-called Differentiable NAS optimization methods, which allows the generation, in a single shot, of a model that respects user-defined constraints on both memory and latency in a time comparable to a single standard training. The proposed approach is evaluated on five IoT-relevant benchmarks, including the MLPerf Tiny suite and Tiny ImageNet, demonstrating that, with a single search, it is possible to reduce memory and latency by 87.4% and 54.2%, respectively (as defined by our targets), while ensuring non-inferior accuracy on state-of-the-art hand-tuned deep neural networks for TinyML.

View on arXiv PDF Code

Similar