DCCVLGApr 9, 2022

Searching for Efficient Neural Architectures for On-Device ML on Edge TPUs

arXiv:2204.14007v115 citationsh-index: 26
Originality Incremental advance
AI Analysis

This work addresses practical limitations in scaling NAS for multiple tasks and platforms, offering incremental improvements for on-device ML deployment on edge devices.

The paper tackles the challenge of efficiently utilizing on-device ML accelerators like Edge TPUs by developing a neural architecture search (NAS) approach with a decoupled infrastructure and group convolution-based search spaces, resulting in improved quality-performance trade-offs for computer vision and NLP tasks on the Google Tensor SoC.

On-device ML accelerators are becoming a standard in modern mobile system-on-chips (SoC). Neural architecture search (NAS) comes to the rescue for efficiently utilizing the high compute throughput offered by these accelerators. However, existing NAS frameworks have several practical limitations in scaling to multiple tasks and different target platforms. In this work, we provide a two-pronged approach to this challenge: (i) a NAS-enabling infrastructure that decouples model cost evaluation, search space design, and the NAS algorithm to rapidly target various on-device ML tasks, and (ii) search spaces crafted from group convolution based inverted bottleneck (IBN) variants that provide flexible quality/performance trade-offs on ML accelerators, complementing the existing full and depthwise convolution based IBNs. Using this approach we target a state-of-the-art mobile platform, Google Tensor SoC, and demonstrate neural architectures that improve the quality-performance pareto frontier for various computer vision (classification, detection, segmentation) as well as natural language processing tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes