CLMar 24, 2025

ZeroLM: Data-Free Transformer Architecture Search for Language Models

arXiv:2503.18646v12 citationsh-index: 26
Originality Incremental advance
AI Analysis

This provides a practical solution for researchers and practitioners in machine learning to perform large-scale architecture search more efficiently, though it is incremental as it builds on existing zero-cost proxy methods.

The paper tackles the problem of high computational cost in neural architecture search (NAS) for Transformer-based language models by introducing a novel zero-cost proxy method that quantifies model capacity through weight statistics and decomposes architectures into sub-modules. It achieves a Spearman's rho of 0.76 and Kendall's tau of 0.53 on the FlexiBERT benchmark, demonstrating superior performance and efficiency.

Neural architecture search (NAS) provides a systematic framework for automating the design of neural network architectures, yet its widespread adoption is hindered by prohibitive computational requirements. Existing zero-cost proxy methods, while reducing search overhead, demonstrate inadequate performance in architecture ranking tasks, particularly for Transformer-based models where they often underperform simple parameter counting metrics. Current automated proxy discovery approaches suffer from extended search times, susceptibility to data overfitting, and structural complexity. This paper introduces a novel zero-cost proxy methodology that quantifies model capacity through efficient weight statistics computation while decomposing Transformer architectures into functionally distinct sub-modules, thereby optimizing the balance of their contributions to overall performance. Our comprehensive evaluation demonstrates the superiority of this approach, achieving a Spearman's rho of 0.76 and Kendall's tau of 0.53 on the FlexiBERT benchmark. The proposed method exhibits exceptional computational efficiency while maintaining robust performance across diverse NAS benchmark tasks, offering a practical solution for large-scale architecture search.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes