CV AIApr 11, 2025

Parameter-Free Fine-tuning via Redundancy Elimination for Vision Foundation Models

Jiahuan Long, Tingsong Jiang, Wen Yao, Yizhe Xiong, Zhengqin Xu, Shuai Jia, Hanqing Liu, Chao Ma

arXiv:2504.08915v23.6h-index: 15

Originality Incremental advance

AI Analysis

This work addresses the challenge of reducing computational and memory costs for fine-tuning large vision models, offering a novel approach that is incremental but integrates with existing strategies.

The paper tackles the problem of adapting vision foundation models to downstream tasks without updating parameters by proposing a parameter-free fine-tuning method that selects, reuses, and enhances pre-trained features, achieving efficiency and effectiveness in tasks like image segmentation, depth estimation, and image classification, with significant reductions in GPU memory overhead.

Vision foundation models (VFMs) have demonstrated remarkable capabilities in learning universal visual representations. However, adapting these models to downstream tasks conventionally requires parameter updates, with even parameter-efficient fine-tuning methods necessitating the modification of thousands to millions of weights. In this paper, we investigate the redundancies in the segment anything model (SAM) and then propose a novel parameter-free fine-tuning method. Unlike traditional fine-tuning methods that adjust parameters, our method emphasizes selecting, reusing, and enhancing pre-trained features, offering a new perspective on fine-tuning foundation models. Specifically, we introduce a channel selection algorithm based on the model's output difference to identify redundant and effective channels. By selectively replacing the redundant channels with more effective ones, we filter out less useful features and reuse more task-irrelevant features to downstream tasks, thereby enhancing the task-specific feature representation. Experiments on both out-of-domain and in-domain datasets demonstrate the efficiency and effectiveness of our method in different vision tasks (e.g., image segmentation, depth estimation and image classification). Notably, our approach can seamlessly integrate with existing fine-tuning strategies (e.g., LoRA, Adapter), further boosting the performance of already fine-tuned models. Moreover, since our channel selection involves only model inference, our method significantly reduces GPU memory overhead.

View on arXiv PDF

Similar