Chaocan Xue

h-index15
2papers

2 Papers

CVMar 9, 2025Code
Similarity-Guided Layer-Adaptive Vision Transformer for UAV Tracking

Chaocan Xue, Bineng Zhong, Qihua Liang et al.

Vision transformers (ViTs) have emerged as a popular backbone for visual tracking. However, complete ViT architectures are too cumbersome to deploy for unmanned aerial vehicle (UAV) tracking which extremely emphasizes efficiency. In this study, we discover that many layers within lightweight ViT-based trackers tend to learn relatively redundant and repetitive target representations. Based on this observation, we propose a similarity-guided layer adaptation approach to optimize the structure of ViTs. Our approach dynamically disables a large number of representation-similar layers and selectively retains only a single optimal layer among them, aiming to achieve a better accuracy-speed trade-off. By incorporating this approach into existing ViTs, we tailor previously complete ViT architectures into an efficient similarity-guided layer-adaptive framework, namely SGLATrack, for real-time UAV tracking. Extensive experiments on six tracking benchmarks verify the effectiveness of the proposed approach, and show that our SGLATrack achieves a state-of-the-art real-time speed while maintaining competitive tracking precision. Codes and models are available at https://github.com/GXNU-ZhongLab/SGLATrack.

CVMar 12, 2025
Bidirectional Prototype-Reward co-Evolution for Test-Time Adaptation of Vision-Language Models

Xiaozhen Qiao, Peng Huang, Jiakang Yuan et al.

Test-time adaptation (TTA) is crucial in maintaining performance of Vision Language Models (VLMs) when facing distribution shifts, particularly when the source data or target labels are inaccessible. Existing TTA methods predominantly leverage the output probability distribution of CLIP for feature evaluation, resulting in biases under domain shifts, which cause misclassified features due to text priors or incorrect textual associations. To address these issues, we propose \underline{B}idirectional Prototype-Reward co-Evolution (BPRE), a novel VLMs framework with TTA that integrates feature quality assessment with prototype evolution via a synergistic feedback loop. First, the Multi-dimensional Quality-aware Reward Module (MQRM) is designed to evaluate feature quality and guide prototype refinement precisely. The continuous refinement of prototype quality via Prototype-Reward Interactive Evolution (PRIE) enhances the computation more robust. Through this bidirectional interaction, the precision of rewards and prototype evolution mutually reinforce each other, forming a self-evolving feedback cycle. Extensive experiments conducted on 15 diverse recognition datasets demonstrate that our model consistently achieves superior performance compared to other SOTA methods, and advances VLM generalization capabilities through emphasizing comprehensive feature evaluation.