CVApr 15, 2025

DMPT: Decoupled Modality-aware Prompt Tuning for Multi-modal Object Re-identification

Minghui Lin, Shu Wang, Xiang Wang, Jianhua Tang, Longbin Fu, Zhengrong Zuo, Nong Sang

arXiv:2504.10985v18.42 citationsh-index: 3WACV

Originality Incremental advance

AI Analysis

This addresses the problem of high computational and storage costs for researchers and practitioners in computer vision, though it is incremental as it builds on existing prompt-tuning ideas.

The paper tackles the computational inefficiency of fine-tuning large pre-trained models for multi-modal object re-identification by proposing DMPT, a prompt-tuning framework that freezes the backbone and optimizes only 6.5% of parameters, achieving competitive results on benchmarks.

Current multi-modal object re-identification approaches based on large-scale pre-trained backbones (i.e., ViT) have displayed remarkable progress and achieved excellent performance. However, these methods usually adopt the standard full fine-tuning paradigm, which requires the optimization of considerable backbone parameters, causing extensive computational and storage requirements. In this work, we propose an efficient prompt-tuning framework tailored for multi-modal object re-identification, dubbed DMPT, which freezes the main backbone and only optimizes several newly added decoupled modality-aware parameters. Specifically, we explicitly decouple the visual prompts into modality-specific prompts which leverage prior modality knowledge from a powerful text encoder and modality-independent semantic prompts which extract semantic information from multi-modal inputs, such as visible, near-infrared, and thermal-infrared. Built upon the extracted features, we further design a Prompt Inverse Bind (PromptIBind) strategy that employs bind prompts as a medium to connect the semantic prompt tokens of different modalities and facilitates the exchange of complementary multi-modal information, boosting final re-identification results. Experimental results on multiple common benchmarks demonstrate that our DMPT can achieve competitive results to existing state-of-the-art methods while requiring only 6.5% fine-tuning of the backbone parameters.

View on arXiv PDF

Similar