CVJun 25, 2024

Minimal Interaction Separated Tuning: A New Paradigm for Visual Adaptation

arXiv:2406.17559v33.72 citations

Originality Highly original

AI Analysis

This addresses the problem of visual adaptation for users with low computational resources, offering an incremental improvement over existing separated tuning methods.

The paper tackles the challenge of fine-tuning large vision models on resource-constrained devices by proposing Minimal Interaction Separated Tuning (MIST), which uses a lightweight adaptor to achieve competitive results on visual adaptation benchmarks with minimal information transfer.

The rapid scaling of large vision pretrained models makes fine-tuning tasks more and more difficult on devices with low computational resources. We explore a new visual adaptation paradigm called separated tuning, which treats large pretrained models as standalone feature extractors that run on powerful cloud servers. The fine-tuning carries out on devices which possess only low computational resources (slow CPU, no GPU, small memory, etc.) Existing methods that are potentially suitable for our separated tuning paradigm are discussed. But, three major drawbacks hinder their application in separated tuning: low adaptation capability, large adapter network, and in particular, high information transfer overhead. To address these issues, we propose Minimal Interaction Separated Tuning, or MIST, which reveals that the sum of intermediate features from pretrained models not only has minimal information transfer but also has high adaptation capability. With a lightweight attention-based adaptor network, MIST achieves information transfer efficiency, parameter efficiency, computational and memory efficiency, and at the same time demonstrates competitive results on various visual adaptation benchmarks.

View on arXiv PDF

Similar