DARTS without a Validation Set: Optimizing the Marginal Likelihood
This addresses the problem of unreliable architecture extraction in weight-sharing NAS for researchers and practitioners, but it is incremental as it builds on prior work with TSE.
The paper tackles the unreliability of extracting the best architecture in DARTS-based neural architecture search by using the Training-Speed-Estimate (TSE) as a replacement for validation loss, which prevents skip connection collapse and improves performance on benchmarks like NASBench-201, with results showing unusual behaviors and negative impacts from depth gap and topology selection.
The success of neural architecture search (NAS) has historically been limited by excessive compute requirements. While modern weight-sharing NAS methods such as DARTS are able to finish the search in single-digit GPU days, extracting the final best architecture from the shared weights is notoriously unreliable. Training-Speed-Estimate (TSE), a recently developed generalization estimator with a Bayesian marginal likelihood interpretation, has previously been used in place of the validation loss for gradient-based optimization in DARTS. This prevents the DARTS skip connection collapse, which significantly improves performance on NASBench-201 and the original DARTS search space. We extend those results by applying various DARTS diagnostics and show several unusual behaviors arising from not using a validation set. Furthermore, our experiments yield concrete examples of the depth gap and topology selection in DARTS having a strongly negative impact on the search performance despite generally receiving limited attention in the literature compared to the operations selection.