CVAILGNov 12, 2023

Aggregate, Decompose, and Fine-Tune: A Simple Yet Effective Factor-Tuning Method for Vision Transformer

arXiv:2311.06749v14 citationsh-index: 14Has Code
Originality Incremental advance
AI Analysis

This addresses the need for efficient fine-tuning in vision tasks, offering a simple and effective method that is incremental over existing approaches like LoRA and FacT.

The paper tackles the problem of inner- and cross-layer redundancy in parameter-efficient fine-tuning for Vision Transformers by introducing EFFT, which achieves state-of-the-art performance with 75.9% top-1 accuracy on VTAB-1K using only 0.28% of full fine-tuning parameters.

Recent advancements have illuminated the efficacy of some tensorization-decomposition Parameter-Efficient Fine-Tuning methods like LoRA and FacT in the context of Vision Transformers (ViT). However, these methods grapple with the challenges of inadequately addressing inner- and cross-layer redundancy. To tackle this issue, we introduce EFfective Factor-Tuning (EFFT), a simple yet effective fine-tuning method. Within the VTAB-1K dataset, our EFFT surpasses all baselines, attaining state-of-the-art performance with a categorical average of 75.9% in top-1 accuracy with only 0.28% of the parameters for full fine-tuning. Considering the simplicity and efficacy of EFFT, it holds the potential to serve as a foundational benchmark. The code and model are now available at https://github.com/Dongping-Chen/EFFT-EFfective-Factor-Tuning.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes