BOMP-NAS: Bayesian Optimization Mixed Precision NAS
This work addresses the need for efficient quantization-aware neural architecture search to reduce computational costs and model sizes for deep learning applications, representing an incremental improvement with specific gains.
The paper tackles the problem of efficiently finding compact, high-performance neural networks under low-precision quantization by integrating quantization-aware fine-tuning into the neural architecture search loop, resulting in a 50% model size reduction on CIFAR-10 and a 6x shorter search time compared to related work.
Bayesian Optimization Mixed-Precision Neural Architecture Search (BOMP-NAS) is an approach to quantization-aware neural architecture search (QA-NAS) that leverages both Bayesian optimization (BO) and mixed-precision quantization (MP) to efficiently search for compact, high performance deep neural networks. The results show that integrating quantization-aware fine-tuning (QAFT) into the NAS loop is a necessary step to find networks that perform well under low-precision quantization: integrating it allows a model size reduction of nearly 50\% on the CIFAR-10 dataset. BOMP-NAS is able to find neural networks that achieve state of the art performance at much lower design costs. This study shows that BOMP-NAS can find these neural networks at a 6x shorter search time compared to the closest related work.