CVApr 20

Spike-NVPT: Learning Robust Visual Prompts via Bio-Inspired Temporal Filtering and Discretization

arXiv:2604.1828448.6h-index: 15
Predicted impact top 71% in CV · last 90 daysOriginality Incremental advance
AI Analysis

For practitioners deploying pre-trained vision models in noisy environments, this method offers a parameter-efficient way to enhance robustness without inference overhead.

Spike-NVPT introduces a noise-robust visual prompt tuning method using spiking neurons to filter noise and discretize prompts into binary form, achieving up to 11.2% improvement in robustness while maintaining competitive clean accuracy.

Pre-trained vision models have found widespread application across diverse domains. Prompt tuning-based methods have emerged as a parameter-efficient paradigm for adapting pre-trained vision models. While effective on standard benchmarks, the continuous and dense nature of learned prompts can lead to sensitivity against input noise, as the high-capacity prompts tend to overfit task-irrelevant details. To address this trade-off, we propose Spike-NVPT, a noise-robust visual prompt tuning method. Specifically, we design a Signal Filtering Layer based on spiking neurons, which uses the integrate-and-fire (IF) mechanism to accumulate task-relevant signals over time and filter transient noise fluctuations. A subsequent Spike Discretization Unit converts filtered signals into sparse binary prompts. This discretization acts as a strong regularizer, forcing the model to anchor decision boundaries on the most discriminative and robust features. Notably, the resulting binary prompts remain static during deployment, ensuring zero additional computational overhead during inference. Experimental results demonstrate that Spike-NVPT achieves superior robustness performance, with a maximum improvement of 11.2% over conventional methods, and retains competitive accuracy on clean datasets. To the best of our knowledge, this is the first attempt to leverage spiking neurons for fine-tuning traditional artificial neural network (ANN)-based visual models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes