CVMay 8

TAS-LoRA: Transformer Architecture Search with Mixture-of-LoRA Experts

arXiv:2605.0725635.0

AI Analysis

For researchers automating vision transformer design, TAS-LoRA provides a novel solution to the feature collapse bottleneck, enabling better subnet-specific feature learning without excessive computational cost.

TAS-LoRA addresses the feature collapse problem in transformer architecture search by introducing parameter-efficient low-rank adaptation with a Mixture-of-LoRA-Experts strategy, achieving significant performance improvements over state-of-the-art methods on ImageNet and multiple transfer learning benchmarks.

Transformer architecture search (TAS) discovers optimal vision transformer (ViT) architectures automatically, reducing human effort to manually design ViTs. However, existing TAS methods suffer from the feature collapse problem, where subnets within a supernet fail to learn subnet-specific features, mainly due to the shared weights in a supernet, limiting the performance of individual subnets. To address this, we propose TAS-LoRA, a novel method that introduces parameter-efficient low-rank adaptation (LoRA) to enable subnet-specific feature learning, while maintaining computational efficiency. TAS-LoRA incorporates a Mixture-of-LoRAExperts (MoLE) strategy, where a lightweight router dynamically assigns LoRA experts based on subnet architectures, and introduces a group-wise router initialization technique to encourage diverse feature learning across experts early in training. Extensive experiments on ImageNet and several transfer learning benchmarks, including CIFAR-10/100, Flowers, CARS, and INAT-19, demonstrate that TAS-LoRA mitigates feature collapse effectively, improving performance over state-of-the-art TAS methods significantly.

View on arXiv PDF

Similar