LGMar 10, 2021

Trainless Model Performance Estimation for Neural Architecture Search

arXiv:2103.08312v21.6

Originality Incremental advance

AI Analysis

This work addresses the need for faster and more efficient neural architecture search by providing a trainless estimation method, though it is incremental as it builds on existing NAS benchmarks and focuses on stability against initializations.

The paper tackles the problem of estimating neural architecture performance without training by analyzing the coefficient of variation of untrained accuracy (CV_U) across multiple weight initializations, finding that minimizing CV_U leads to architectures with accuracies of 91.90 ± 2.27 on CIFAR-10, 64.08 ± 5.63 on CIFAR-100, and 38.76 ± 6.62 on ImageNet16-120, statistically above random baselines.

Neural architecture search has become an indispensable part of the deep learning field. Modern methods allow to find one of the best performing architectures, or to build one from scratch, but they typically make decisions based on the trained accuracy information. In the present article we explore instead how the architectural component of a neural network affects its prediction power. We focus on relationships between the trained accuracy of an architecture and its accuracy prior to training, by considering statistics over multiple initialisations. We observe that minimising the coefficient of variation of the untrained accuracy, $CV_{U}$, consistently leads to better performing architectures. We test the $CV_{U}$ as a neural architecture search scoring metric using the NAS-Bench-201 database of trained neural architectures. The architectures with the lowest $CV_{U}$ value have on average an accuracy of $91.90 \pm 2.27$, $64.08 \pm 5.63$ and $38.76 \pm 6.62$ for CIFAR-10, CIFAR-100 and a downscaled version of ImageNet, respectively. Since these values are statistically above the random baseline, we make a conclusion that a good architecture should be stable against weights initialisations. It takes about $190$ s for CIFAR-10 and CIFAR-100 and $133.9$ s for ImageNet16-120 to process $100$ architectures, on a batch of $256$ images, with $100$ initialisations.

View on arXiv PDF

Similar