CV AINov 7, 2023

Mini but Mighty: Finetuning ViTs with Mini Adapters

Imad Eddine Marouf, Enzo Tartaglione, Stéphane Lathuilière

arXiv:2311.03873v18.413 citationsh-index: 21Has Code

Originality Incremental advance

AI Analysis

This work addresses the efficiency of parameter-efficient transfer learning for computer vision practitioners, though it is incremental as it builds on existing adapter methods.

The paper tackles the poor performance of small adapters in fine-tuning Vision Transformers by proposing MiMi, a training framework that starts with large adapters and iteratively reduces their size, outperforming existing methods in accuracy-parameter trade-offs across 29 datasets.

Vision Transformers (ViTs) have become one of the dominant architectures in computer vision, and pre-trained ViT models are commonly adapted to new tasks via fine-tuning. Recent works proposed several parameter-efficient transfer learning methods, such as adapters, to avoid the prohibitive training and storage cost of finetuning. In this work, we observe that adapters perform poorly when the dimension of adapters is small, and we propose MiMi, a training framework that addresses this issue. We start with large adapters which can reach high performance, and iteratively reduce their size. To enable automatic estimation of the hidden dimension of every adapter, we also introduce a new scoring function, specifically designed for adapters, that compares the neuron importance across layers. Our method outperforms existing methods in finding the best trade-off between accuracy and trained parameters across the three dataset benchmarks DomainNet, VTAB, and Multi-task, for a total of 29 datasets.

View on arXiv PDF Code

Similar