Accelerator-aware Neural Network Design using AutoML
This work addresses the challenge of deploying efficient neural networks on low-power edge devices, though it is incremental as it builds on existing AutoML and hardware-aware methods.
The paper tackles the problem of optimizing neural networks for hardware accelerators by using hardware-aware neural architecture search to design models for Google's Edge TPU, achieving real-time image classification on edge devices with accuracy comparable to larger data center models and improving the accuracy-latency tradeoff over existing state-of-the-art mobile models.
While neural network hardware accelerators provide a substantial amount of raw compute throughput, the models deployed on them must be co-designed for the underlying hardware architecture to obtain the optimal system performance. We present a class of computer vision models designed using hardware-aware neural architecture search and customized to run on the Edge TPU, Google's neural network hardware accelerator for low-power, edge devices. For the Edge TPU in Coral devices, these models enable real-time image classification performance while achieving accuracy typically seen only with larger, compute-heavy models running in data centers. On Pixel 4's Edge TPU, these models improve the accuracy-latency tradeoff over existing SoTA mobile models.