CVJul 10, 2024

Study on Aspect Ratio Variability toward Robustness of Vision Transformer-based Vehicle Re-identification

arXiv:2407.07842v12 citationsh-index: 4
Originality Incremental advance
AI Analysis

This work addresses robustness issues in vehicle re-identification for surveillance and security applications, representing an incremental improvement with specific gains.

The paper tackles the problem of non-square aspect ratios degrading Vision Transformer performance in vehicle re-identification, proposing a framework that fuses models trained on varied aspect ratios and achieves a mean Average Precision of 91.0% on the VehicleID dataset, compared to a prior state-of-the-art of 80.9%.

Vision Transformers (ViTs) have excelled in vehicle re-identification (ReID) tasks. However, non-square aspect ratios of image or video input might significantly affect the re-identification performance. To address this issue, we propose a novel ViT-based ReID framework in this paper, which fuses models trained on a variety of aspect ratios. Our main contributions are threefold: (i) We analyze aspect ratio performance on VeRi-776 and VehicleID datasets, guiding input settings based on aspect ratios of original images. (ii) We introduce patch-wise mixup intra-image during ViT patchification (guided by spatial attention scores) and implement uneven stride for better object aspect ratio matching. (iii) We propose a dynamic feature fusing ReID network, enhancing model robustness. Our ReID method achieves a significantly improved mean Average Precision (mAP) of 91.0\% compared to the the closest state-of-the-art (CAL) result of 80.9\% on VehicleID dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes