CVApr 24, 2025

SDVPT: Semantic-Driven Visual Prompt Tuning for Open-World Object Counting

arXiv:2504.17395v12 citationsh-index: 8MM
Originality Incremental advance
AI Analysis

This work addresses the challenge of counting arbitrary objects in images for applications like surveillance and autonomous driving, but it is incremental as it builds on existing models with a novel tuning method.

The paper tackles the problem of limited generalizability in open-world object counting for unseen categories by proposing SDVPT, a plug-and-play framework that transfers knowledge from training to unseen categories with minimal overhead, achieving improved performance across multiple datasets.

Open-world object counting leverages the robust text-image alignment of pre-trained vision-language models (VLMs) to enable counting of arbitrary categories in images specified by textual queries. However, widely adopted naive fine-tuning strategies concentrate exclusively on text-image consistency for categories contained in training, which leads to limited generalizability for unseen categories. In this work, we propose a plug-and-play Semantic-Driven Visual Prompt Tuning framework (SDVPT) that transfers knowledge from the training set to unseen categories with minimal overhead in parameters and inference time. First, we introduce a two-stage visual prompt learning strategy composed of Category-Specific Prompt Initialization (CSPI) and Topology-Guided Prompt Refinement (TGPR). The CSPI generates category-specific visual prompts, and then TGPR distills latent structural patterns from the VLM's text encoder to refine these prompts. During inference, we dynamically synthesize the visual prompts for unseen categories based on the semantic correlation between unseen and training categories, facilitating robust text-image alignment for unseen categories. Extensive experiments integrating SDVPT with all available open-world object counting models demonstrate its effectiveness and adaptability across three widely used datasets: FSC-147, CARPK, and PUCPR+.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes