CVApr 16

Beyond Visual Cues: Semantic-Driven Token Filtering and Expert Routing for Anytime Person ReID

arXiv:2604.1509057.8h-index: 3
Predicted impact top 60% in CV · last 90 daysOriginality Incremental advance
AI Analysis

For person re-identification researchers, this work addresses the bottleneck of visual feature instability under environmental and clothing variations by introducing semantic guidance from LVLMs.

The paper tackles the problem of Any-Time Person Re-identification (AT-ReID) under varying conditions (modality shifts, clothing changes). The proposed STFER framework uses LVLMs to generate identity-consistent text for semantic-driven token filtering and expert routing, achieving state-of-the-art results on the AT-USTC dataset and strong generalization across 5 ReID benchmarks.

Any-Time Person Re-identification (AT-ReID) necessitates the robust retrieval of target individuals under arbitrary conditions, encompassing both modality shifts (daytime and nighttime) and extensive clothing-change scenarios, ranging from short-term to long-term intervals. However, existing methods are highly relying on pure visual features, which are prone to change due to environmental and time factors, resulting in significantly performance deterioration under scenarios involving illumination caused modality shifts or cloth-change. In this paper, we propose Semantic-driven Token Filtering and Expert Routing (STFER), a novel framework that leverages the ability of Large Vision-Language Models (LVLMs) to generate identity consistency text, which provides identity-discriminative features that are robust to both clothing variations and cross-modality shifts between RGB and IR. Specifically, we employ instructions to guide the LVLM in generating identity-intrinsic semantic text that captures biometric constants for the semantic model driven. The text token is further used for Semantic-driven Visual Token Filtering (SVTF), which enhances informative visual regions and suppresses redundant background noise. Meanwhile, the text token is also used for Semantic-driven Expert Routing (SER), which integrates the semantic text into expert routing, resulting in more robust multi-scenario gating. Extensive experiments on the Any-Time ReID dataset (AT-USTC) demonstrate that our model achieves state-of-the-art results. Moreover, the model trained on AT-USTC was evaluated across 5 widely-used ReID benchmarks demonstrating superior generalization capabilities with highly competitive results. Our code will be available soon.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes