CVJul 21, 2023

Tuning Pre-trained Model via Moment Probing

Mingze Gao, Qilong Wang, Zhenyi Lin, Pengfei Zhu, Qinghua Hu, Jingbo Zhou

arXiv:2307.11342v310.417 citationsh-index: 36Has Code

Originality Incremental advance

AI Analysis

This work addresses a bottleneck in efficient fine-tuning for machine learning practitioners, though it is incremental as it builds on existing linear probing methods.

The paper tackles the problem of improving linear probing in fine-tuning pre-trained models by proposing Moment Probing (MP), which uses feature distribution moments for classification, and achieves competitive or state-of-the-art performance on ten benchmarks with less training cost.

Recently, efficient fine-tuning of large-scale pre-trained models has attracted increasing research interests, where linear probing (LP) as a fundamental module is involved in exploiting the final representations for task-dependent classification. However, most of the existing methods focus on how to effectively introduce a few of learnable parameters, and little work pays attention to the commonly used LP module. In this paper, we propose a novel Moment Probing (MP) method to further explore the potential of LP. Distinguished from LP which builds a linear classification head based on the mean of final features (e.g., word tokens for ViT) or classification tokens, our MP performs a linear classifier on feature distribution, which provides the stronger representation ability by exploiting richer statistical information inherent in features. Specifically, we represent feature distribution by its characteristic function, which is efficiently approximated by using first- and second-order moments of features. Furthermore, we propose a multi-head convolutional cross-covariance (MHC$^3$) to compute second-order moments in an efficient and effective manner. By considering that MP could affect feature learning, we introduce a partially shared module to learn two recalibrating parameters (PSRP) for backbones based on MP, namely MP$_{+}$. Extensive experiments on ten benchmarks using various models show that our MP significantly outperforms LP and is competitive with counterparts at less training cost, while our MP$_{+}$ achieves state-of-the-art performance.

View on arXiv PDF Code

Similar