CVMay 1

Intrinsic Gradient Suppression for Label-Noise Prompt Tuning in Vision-Language Models

Jiayu Li, Jiaxin Qi, Sheng Zhou, Jiaqiang Huang, Xiansheng Hua

arXiv:2605.0059126.9h-index: 2

AI Analysis

For practitioners using vision-language models with noisy labels, this provides a simple, drop-in method that outperforms complex approaches.

Prompt tuning in CLIP is highly sensitive to label noise. The authors propose Double-Softmax Prompt Tuning (DSPT), a hyperparameter-free method that suppresses gradients from noisy samples, achieving state-of-the-art robustness across various noisy benchmarks.

Contrastive vision-language models like CLIP exhibit remarkable zero-shot generalization. However, prompt tuning remains highly sensitive to label noise, as mislabeled samples generate disproportionately large gradients that can overwhelm pre-trained priors. We argue that because CLIP already provides a near-optimal initialization, adaptation should be inherently conservative, particularly against the extreme gradient updates common in noisy settings. To this end, we propose Double-Softmax Prompt Tuning (DSPT), a hyperparameter-free method for intrinsic gradient suppression. By applying a sequential probabilistic normalization, DSPT induces a self-adaptive saturation zone that suppresses gradients from high-error noisy samples while maintaining informative updates. We also provide both theoretical analysis and empirical evidence about how this mechanism achieves adaptive suppression. This design transforms ``gradient vanishing'', traditionally a training bottleneck, into a principled noise-filtering shield for label-noise prompt tuning. Extensive experiments confirm that this simple, drop-in design achieves state-of-the-art robustness across various noisy benchmarks, outperforming methods with complex architectures and handcrafted hyperparameters.

View on arXiv PDF

Similar