CVFeb 24, 2021

SFANet: A Spectrum-aware Feature Augmentation Network for Visible-Infrared Person Re-Identification

arXiv:2102.12137v1110 citations
Originality Incremental advance
AI Analysis

This addresses cross-modality matching for surveillance systems, but it is incremental as it builds on existing two-stream networks with specific enhancements.

The paper tackles the problem of visible-infrared person re-identification by proposing SFANet, which uses grayscale-spectrum images to reduce modality discrepancies and incorporates a bi-directional tri-constrained loss, achieving competitive performance on SYSU-MM01 and RegDB datasets.

Visible-Infrared person re-identification (VI-ReID) is a challenging matching problem due to large modality varitions between visible and infrared images. Existing approaches usually bridge the modality gap with only feature-level constraints, ignoring pixel-level variations. Some methods employ GAN to generate style-consistent images, but it destroys the structure information and incurs a considerable level of noise. In this paper, we explicitly consider these challenges and formulate a novel spectrum-aware feature augementation network named SFANet for cross-modality matching problem. Specifically, we put forward to employ grayscale-spectrum images to fully replace RGB images for feature learning. Learning with the grayscale-spectrum images, our model can apparently reduce modality discrepancy and detect inner structure relations across the different modalities, making it robust to color variations. In feature-level, we improve the conventional two-stream network through balancing the number of specific and sharable convolutional blocks, which preserve the spatial structure information of features. Additionally, a bi-directional tri-constrained top-push ranking loss (BTTR) is embedded in the proposed network to improve the discriminability, which efficiently further boosts the matching accuracy. Meanwhile, we further introduce an effective dual-linear with batch normalization ID embedding method to model the identity-specific information and assits BTTR loss in magnitude stabilizing. On SYSU-MM01 and RegDB datasets, we conducted extensively experiments to demonstrate that our proposed framework contributes indispensably and achieves a very competitive VI-ReID performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes