CVMar 18, 2024

Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation

Stanford
arXiv:2403.11808v233 citationsh-index: 23NIPS
Originality Incremental advance
AI Analysis

This addresses the computational inefficiency of pre-trained ViT models, enabling broader application in resource-constrained settings, though it is incremental as it builds on existing PEFT methods.

The paper tackles the problem of improving both parameter and inference efficiency for vision transformer adaptation, proposing Dynamic Tuning (DyT) which achieves superior performance compared to existing methods while using only 71% of their FLOPs on the VTAB-1K benchmark.

Existing parameter-efficient fine-tuning (PEFT) methods have achieved significant success on vision transformers (ViTs) adaptation by improving parameter efficiency. However, the exploration of enhancing inference efficiency during adaptation remains underexplored. This limits the broader application of pre-trained ViT models, especially when the model is computationally extensive. In this paper, we propose Dynamic Tuning (DyT), a novel approach to improve both parameter and inference efficiency for ViT adaptation. Specifically, besides using the lightweight adapter modules, we propose a token dispatcher to distinguish informative tokens from less important ones, allowing the latter to dynamically skip the original block, thereby reducing the redundant computation during inference. Additionally, we explore multiple design variants to find the best practice of DyT. Finally, inspired by the mixture-of-experts (MoE) mechanism, we introduce an enhanced adapter to further boost the adaptation performance. We validate DyT across various tasks, including image/video recognition and semantic segmentation. For instance, DyT achieves superior performance compared to existing PEFT methods while evoking only 71% of their FLOPs on the VTAB-1K benchmark.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes