CVAug 14, 2024

Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach

Shizhou Zhang, Wenlong Luo, De Cheng, Qingchun Yang, Lingyan Ran, Yinghui Xing, Yanning Zhang

arXiv:2408.07500v217.827 citationsh-index: 72Has Code

Originality Incremental advance

AI Analysis

This addresses the challenge of identifying persons across ground and aerial platforms, which is incremental as it builds on existing video ReID methods with a new dataset and adaptation approach.

The authors tackled the problem of cross-platform video person re-identification by constructing the first large-scale Ground-to-Aerial video dataset (G2A-VReID) with 185,907 images and 5,576 tracklets, and proposed a method using CLIP and adapters that achieved superior results on existing and new datasets.

In this paper, we construct a large-scale benchmark dataset for Ground-to-Aerial Video-based person Re-Identification, named G2A-VReID, which comprises 185,907 images and 5,576 tracklets, featuring 2,788 distinct identities. To our knowledge, this is the first dataset for video ReID under Ground-to-Aerial scenarios. G2A-VReID dataset has the following characteristics: 1) Drastic view changes; 2) Large number of annotated identities; 3) Rich outdoor scenarios; 4) Huge difference in resolution. Additionally, we propose a new benchmark approach for cross-platform ReID by transforming the cross-platform visual alignment problem into visual-semantic alignment through vision-language model (i.e., CLIP) and applying a parameter-efficient Video Set-Level-Adapter module to adapt image-based foundation model to video ReID tasks, termed VSLA-CLIP. Besides, to further reduce the great discrepancy across the platforms, we also devise the platform-bridge prompts for efficient visual feature alignment. Extensive experiments demonstrate the superiority of the proposed method on all existing video ReID datasets and our proposed G2A-VReID dataset.

View on arXiv PDF Code

Similar