CVAug 15, 2024

Training Spatial-Frequency Visual Prompts and Probabilistic Clusters for Accurate Black-Box Transfer Learning

arXiv:2408.07944v11 citationsh-index: 10
Originality Incremental advance
AI Analysis

This addresses the problem of efficiently adapting black-box vision models for real-world applications with data and resource constraints, representing an incremental improvement over existing parameter-efficient transfer learning methods.

The paper tackles the challenge of adapting black-box pre-trained vision models to target domains with limited data and computational resources by proposing a parameter-efficient transfer learning framework that generates spatial-frequency visual prompts and uses probabilistic clusters to enhance class separation. The method achieves superior few-shot transfer learning performance across multiple datasets while reducing computational costs.

Despite the growing prevalence of black-box pre-trained models (PTMs) such as prediction API services, there remains a significant challenge in directly applying general models to real-world scenarios due to the data distribution gap. Considering a data deficiency and constrained computational resource scenario, this paper proposes a novel parameter-efficient transfer learning framework for vision recognition models in the black-box setting. Our framework incorporates two novel training techniques. First, we align the input space (i.e., image) of PTMs to the target data distribution by generating visual prompts of spatial and frequency domain. Along with the novel spatial-frequency hybrid visual prompter, we design a novel training technique based on probabilistic clusters, which can enhance class separation in the output space (i.e., prediction probabilities). In experiments, our model demonstrates superior performance in a few-shot transfer learning setting across extensive visual recognition datasets, surpassing state-of-the-art baselines. Additionally, we show that the proposed method efficiently reduces computational costs for training and inference phases.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes