Rethink DARTS Search Space and Renovate a New Benchmark
This work addresses a critical issue in NAS benchmarking by providing a more robust and comprehensive evaluation framework, which is incremental as it builds upon existing DSS improvements.
The authors tackled the problem of the narrow accuracy range in the DARTS search space (DSS) benchmark for neural architecture search (NAS), which can obscure method rankings, by proposing a larger and harder DSS called LHD and renovating a new benchmark with improved discernibility and accessibility, evaluating twelve baselines across twelve conditions.
DARTS search space (DSS) has become a canonical benchmark for NAS whereas some emerging works pointed out the issue of narrow accuracy range and claimed it would hurt the method ranking. We observe some recent studies already suffer from this issue that overshadows the meaning of scores. In this work, we first propose and orchestrate a suite of improvements to frame a larger and harder DSS, termed LHD, while retaining high efficiency in search. We step forward to renovate a LHD-based new benchmark, taking care of both discernibility and accessibility. Specifically, we re-implement twelve baselines and evaluate them across twelve conditions by combining two underexpolored influential factors: transductive robustness and discretization policy, to reasonably construct a benchmark upon multi-condition evaluation. Considering that the tabular benchmarks are always insufficient to adequately evaluate the methods of neural architecture search (NAS), our work can serve as a crucial basis for the future progress of NAS. https://github.com/chaoji90/LHD