LGDec 12, 2020

Efficient Incorporation of Multiple Latency Targets in the Once-For-All Network

arXiv:2012.06748v1
AI Analysis

This work provides an incremental improvement for machine learning engineers and researchers working with neural architecture search, specifically enhancing the efficiency of multi-latency target deployment for OFA networks.

This paper addresses the inefficiency of the Once-for-All (OFA) network in incorporating multiple latency targets during its search phase. The authors propose two strategies, Top-down and Bottom-up, which utilize warm starting and randomized network pruning to significantly improve running time performance without sacrificing subnetwork accuracy across various latency targets and design spaces.

Neural Architecture Search has proven an effective method of automating architecture engineering. Recent work in the field has been to look for architectures subject to multiple objectives such as accuracy and latency to efficiently deploy them on different target hardware. Once-for-All (OFA) is one such method that decouples training and search and is able to find high-performance networks for different latency constraints. However, the search phase is inefficient at incorporating multiple latency targets. In this paper, we introduce two strategies (Top-down and Bottom-up) that use warm starting and randomized network pruning for the efficient incorporation of multiple latency targets in the OFA network. We evaluate these strategies against the current OFA implementation and demonstrate that our strategies offer significant running time performance gains while not sacrificing the accuracy of the subnetworks that were found for each latency target. We further demonstrate that these performance gains are generalized to every design space used by the OFA network.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes