Zexian Yang

CV
h-index18
4papers
19citations
Novelty59%
AI Score42

4 Papers

CVOct 17, 2022
Handling Label Uncertainty for Camera Incremental Person Re-Identification

Zexian Yang, Dayan Wu, Wanqian Zhang et al.

Incremental learning for person re-identification (ReID) aims to develop models that can be trained with a continuous data stream, which is a more practical setting for real-world applications. However, the existing incremental ReID methods make two strong assumptions that the cameras are fixed and the new-emerging data is class-disjoint from previous classes. This is unrealistic as previously observed pedestrians may re-appear and be captured again by new cameras. In this paper, we investigate person ReID in an unexplored scenario named Camera Incremental Person ReID (CIPR), which advances existing lifelong person ReID by taking into account the class overlap issue. Specifically, new data collected from new cameras may probably contain an unknown proportion of identities seen before. This subsequently leads to the lack of cross-camera annotations for new data due to privacy concerns. To address these challenges, we propose a novel framework ExtendOVA. First, to handle the class overlap issue, we introduce an instance-wise seen-class identification module to discover previously seen identities at the instance level. Then, we propose a criterion for selecting confident ID-wise candidates and also devise an early learning regularization term to correct noise issues in pseudo labels. Furthermore, to compensate for the lack of previous data, we resort prototypical memory bank to create surrogate features, along with a cross-camera distillation loss to further retain the inter-camera relationship. The comprehensive experimental results on multiple benchmarks show that ExtendOVA significantly outperforms the state-of-the-arts with remarkable advantages.

68.2CVApr 7
Weather-Conditioned Branch Routing for Robust LiDAR-Radar 3D Object Detection

Hongsheng Li, Lingfeng Zhang, Zexian Yang et al.

Robust 3D object detection in adverse weather is highly challenging due to the varying reliability of different sensors. While existing LiDAR-4D radar fusion methods improve robustness, they predominantly rely on fixed or weakly adaptive pipelines, failing to dy-namically adjust modality preferences as environmental conditions change. To bridge this gap, we reformulate multi-modal perception as a weather-conditioned branch routing problem. Instead of computing a single fused output, our framework explicitly maintains three parallel 3D feature streams: a pure LiDAR branch, a pure 4D radar branch, and a condition-gated fusion branch. Guided by a condition token extracted from visual and semantic prompts, a lightweight router dynamically predicts sample-specific weights to softly aggregate these representations. Furthermore, to prevent branch collapse, we introduce a weather-supervised learning strategy with auxiliary classification and diversity regularization to enforce distinct, condition-dependent routing behaviors. Extensive experiments on the K-Radar benchmark demonstrate that our method achieves state-of-the-art performance. Furthermore, it provides explicit and highly interpretable insights into modality preferences, transparently revealing how adaptive routing robustly shifts reliance between LiDAR and 4D radar across diverse adverse-weather scenarios. The source code with be released.

CVJun 1, 2025Code
Uneven Event Modeling for Partially Relevant Video Retrieval

Sa Zhu, Huashan Chen, Wanqian Zhang et al.

Given a text query, partially relevant video retrieval (PRVR) aims to retrieve untrimmed videos containing relevant moments, wherein event modeling is crucial for partitioning the video into smaller temporal events that partially correspond to the text. Previous methods typically segment videos into a fixed number of equal-length clips, resulting in ambiguous event boundaries. Additionally, they rely on mean pooling to compute event representations, inevitably introducing undesired misalignment. To address these, we propose an Uneven Event Modeling (UEM) framework for PRVR. We first introduce the Progressive-Grouped Video Segmentation (PGVS) module, to iteratively formulate events in light of both temporal dependencies and semantic similarity between consecutive frames, enabling clear event boundaries. Furthermore, we also propose the Context-Aware Event Refinement (CAER) module to refine the event representation conditioned the text's cross-attention. This enables event representations to focus on the most relevant frames for a given text, facilitating more precise text-video alignment. Extensive experiments demonstrate that our method achieves state-of-the-art performance on two PRVR benchmarks. Code is available at https://github.com/Sasa77777779/UEM.git.

CVMay 12, 2025
Critique Before Thinking: Mitigating Hallucination through Rationale-Augmented Instruction Tuning

Zexian Yang, Dian Li, Dayan Wu et al.

Despite significant advancements in multimodal reasoning tasks, existing Large Vision-Language Models (LVLMs) are prone to producing visually ungrounded responses when interpreting associated images. In contrast, when humans embark on learning new knowledge, they often rely on a set of fundamental pre-study principles: reviewing outlines to grasp core concepts, summarizing key points to guide their focus and enhance understanding. However, such preparatory actions are notably absent in the current instruction tuning processes. This paper presents Re-Critic, an easily scalable rationale-augmented framework designed to incorporate fundamental rules and chain-of-thought (CoT) as a bridge to enhance reasoning abilities. Specifically, Re-Critic develops a visual rationale synthesizer that scalably augments raw instructions with rationale explanation. To probe more contextually grounded responses, Re-Critic employs an in-context self-critic mechanism to select response pairs for preference tuning. Experiments demonstrate that models fine-tuned with our rationale-augmented dataset yield gains that extend beyond hallucination-specific tasks to broader multimodal reasoning tasks.