CVAILGMar 15, 2024

When Training-Free NAS Meets Vision Transformer: A Neural Tangent Kernel Perspective

arXiv:2405.04536v13 citationsh-index: 3ICASSP
Originality Incremental advance
AI Analysis

This addresses a bottleneck in efficient architecture search for vision transformers, though it is incremental as it builds on existing NTK methods.

The paper tackled the inefficacy of Neural Tangent Kernel (NTK) metrics for training-free neural architecture search (NAS) in vision transformers (ViT), proposing ViNTK to incorporate high-frequency signals, which sped up search costs significantly while maintaining similar performance on image classification and semantic segmentation tasks.

This paper investigates the Neural Tangent Kernel (NTK) to search vision transformers without training. In contrast with the previous observation that NTK-based metrics can effectively predict CNNs performance at initialization, we empirically show their inefficacy in the ViT search space. We hypothesize that the fundamental feature learning preference within ViT contributes to the ineffectiveness of applying NTK to NAS for ViT. We both theoretically and empirically validate that NTK essentially estimates the ability of neural networks that learn low-frequency signals, completely ignoring the impact of high-frequency signals in feature learning. To address this limitation, we propose a new method called ViNTK that generalizes the standard NTK to the high-frequency domain by integrating the Fourier features from inputs. Experiments with multiple ViT search spaces on image classification and semantic segmentation tasks show that our method can significantly speed up search costs over prior state-of-the-art NAS for ViT while maintaining similar performance on searched architectures.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes