ParZC: Parametric Zero-Cost Proxies for Efficient NAS
This work addresses a critical issue in zero-shot NAS for researchers and practitioners by enhancing efficiency and accuracy in architecture ranking, though it is incremental as it builds on existing zero-cost proxy methods.
The paper tackles the problem of inefficient and indiscriminate node aggregation in zero-shot Neural Architecture Search (NAS) by introducing the Parametric Zero-Cost Proxies (ParZC) framework, which uses a Mixer Architecture with Bayesian Network and DiffKendall loss to improve performance estimation, achieving superior results on NAS-Bench-101, 201, and NDS benchmarks.
Recent advancements in Zero-shot Neural Architecture Search (NAS) highlight the efficacy of zero-cost proxies in various NAS benchmarks. Several studies propose the automated design of zero-cost proxies to achieve SOTA performance but require tedious searching progress. Furthermore, we identify a critical issue with current zero-cost proxies: they aggregate node-wise zero-cost statistics without considering the fact that not all nodes in a neural network equally impact performance estimation. Our observations reveal that node-wise zero-cost statistics significantly vary in their contributions to performance, with each node exhibiting a degree of uncertainty. Based on this insight, we introduce a novel method called Parametric Zero-Cost Proxies (ParZC) framework to enhance the adaptability of zero-cost proxies through parameterization. To address the node indiscrimination, we propose a Mixer Architecture with Bayesian Network (MABN) to explore the node-wise zero-cost statistics and estimate node-specific uncertainty. Moreover, we propose DiffKendall as a loss function to directly optimize Kendall's Tau coefficient in a differentiable manner so that our ParZC can better handle the discrepancies in ranking architectures. Comprehensive experiments on NAS-Bench-101, 201, and NDS demonstrate the superiority of our proposed ParZC compared to existing zero-shot NAS methods. Additionally, we demonstrate the versatility and adaptability of ParZC by transferring it to the Vision Transformer search space.