LGCVJul 17, 2025

DASViT: Differentiable Architecture Search for Vision Transformer

arXiv:2507.13079v1h-index: 3IJCNN
Originality Highly original
AI Analysis

This work addresses the problem of inefficient and computationally intensive architecture search for Vision Transformers, offering a more efficient solution for researchers and practitioners in computer vision.

The paper tackled the challenge of automating neural architecture design for Vision Transformers by introducing DASViT, a differentiable search method that discovers novel architectures, resulting in models that outperform ViT-B/16 on multiple datasets with fewer parameters and FLOPs.

Designing effective neural networks is a cornerstone of deep learning, and Neural Architecture Search (NAS) has emerged as a powerful tool for automating this process. Among the existing NAS approaches, Differentiable Architecture Search (DARTS) has gained prominence for its efficiency and ease of use, inspiring numerous advancements. Since the rise of Vision Transformers (ViT), researchers have applied NAS to explore ViT architectures, often focusing on macro-level search spaces and relying on discrete methods like evolutionary algorithms. While these methods ensure reliability, they face challenges in discovering innovative architectural designs, demand extensive computational resources, and are time-intensive. To address these limitations, we introduce Differentiable Architecture Search for Vision Transformer (DASViT), which bridges the gap in differentiable search for ViTs and uncovers novel designs. Experiments show that DASViT delivers architectures that break traditional Transformer encoder designs, outperform ViT-B/16 on multiple datasets, and achieve superior efficiency with fewer parameters and FLOPs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes