LGMay 8

Zero-Shot Neural Network Evaluation with Sample-Wise Activation Patterns

Yameng Peng, Andy Song, HaythamM. Fayek, Vic Ciesielski, Xiaojun Chang

arXiv:2605.0737859.0Has Code

AI Analysis

For researchers in neural architecture search and model evaluation, SWAP-Score provides a universal, label-independent zero-shot metric that works across architectures and tasks, outperforming prior metrics.

The paper introduces SWAP-Score, a zero-shot metric for evaluating neural networks without training, which achieves high correlation with ground-truth performance across CNNs and Transformers on vision and NLP tasks (e.g., Spearman's 0.93 on CIFAR-10 for DARTS CNNs, 0.71 on GLUE for FlexiBERT). SWAP-NAS, using this metric, finds competitive architectures in ~6 minutes on CIFAR-10 and ~9 minutes on ImageNet.

Zero-shot proxies, also known as training-free metrics, are widely adopted to reduce the computational overhead in neural network evaluation for scenarios such as Neural Architecture Search (NAS), as they do not require any training. Existing zero-shot metrics have several limitations, including weak correlation with the true performance and poor generalisation across different networks or downstream tasks. For example, most of these metrics apply only to either convolutional neural networks (CNNs) or Transformers, but not both. To address these limitations, we propose Sample-Wise Activation Patterns (SWAP), and its derivative, SWAP-Score, a novel and highly effective zero-shot metric. SWAP-Score is broadly applicable across both architecture families and task domains, demonstrating strong predictive performance in the majority of tasks. This metric measures the expressivity of neural networks over a mini-batch of samples, showing a high correlation with the neural networks' ground-truth performance. For both CNNs and Transformers, the SWAP-Score outperforms existing zero-shot metrics across computer vision and natural language processing tasks. For instance, Spearman's correlation coefficient between the SWAP-Score and CIFAR-10 validation accuracy for DARTS CNNs is 0.93, and 0.71 for FlexiBERT Transformers on GLUE tasks. Moreover, SWAP-Score is label-independent, hence can be applied at the pre-training stage of language models to estimate their performance for downstream tasks. When applied to NAS, SWAP-empowered NAS, SWAP-NAS can achieve competitive performance using only approximately 6 and 9 minutes of GPU time, on CIFAR-10 and ImageNet respectively. Our code is available at: https://github.com/pym1024/SWAP_Universal

View on arXiv PDF Code

Similar