LGNESPMLJul 17, 2020

Standing on the Shoulders of Giants: Hardware and Neural Architecture Co-Search with Hot Start

arXiv:2007.09087v168 citations
AI Analysis

This work addresses the problem of making AI co-search accessible on commodity hardware by reducing computational costs, though it is incremental as it builds on existing model zoos and co-search methods.

The paper tackles the inefficiency of hardware and neural architecture co-search frameworks, which require hundreds of GPU hours, by proposing HotNAS, a framework that starts from pre-trained models to reduce search time from 200 GPU hours to less than 3 GPU hours, achieving up to 5.79% Top-1 and 3.97% Top-5 accuracy gain on ImageNet with a 5ms timing constraint.

Hardware and neural architecture co-search that automatically generates Artificial Intelligence (AI) solutions from a given dataset is promising to promote AI democratization; however, the amount of time that is required by current co-search frameworks is in the order of hundreds of GPU hours for one target hardware. This inhibits the use of such frameworks on commodity hardware. The root cause of the low efficiency in existing co-search frameworks is the fact that they start from a "cold" state (i.e., search from scratch). In this paper, we propose a novel framework, namely HotNAS, that starts from a "hot" state based on a set of existing pre-trained models (a.k.a. model zoo) to avoid lengthy training time. As such, the search time can be reduced from 200 GPU hours to less than 3 GPU hours. In HotNAS, in addition to hardware design space and neural architecture search space, we further integrate a compression space to conduct model compressing during the co-search, which creates new opportunities to reduce latency but also brings challenges. One of the key challenges is that all of the above search spaces are coupled with each other, e.g., compression may not work without hardware design support. To tackle this issue, HotNAS builds a chain of tools to design hardware to support compression, based on which a global optimizer is developed to automatically co-search all the involved search spaces. Experiments on ImageNet dataset and Xilinx FPGA show that, within the timing constraint of 5ms, neural architectures generated by HotNAS can achieve up to 5.79% Top-1 and 3.97% Top-5 accuracy gain, compared with the existing ones.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes