CVOct 11, 2021

Rethinking Person Re-Identification via Semantic-Based Pretraining

arXiv:2110.05074v218 citations
AI Analysis

This addresses the pretraining bottleneck for person re-identification researchers, offering an incremental improvement over existing methods.

The paper tackles the limited impact of ImageNet pretraining on person re-identification due to domain gaps by proposing semantic-based pretraining using a manually constructed caption dataset, achieving competitive performance with up to 1.4x fewer images.

Pretraining is a dominant paradigm in computer vision. Generally, supervised ImageNet pretraining is commonly used to initialize the backbones of person re-identification (Re-ID) models. However, recent works show a surprising result that CNN-based pretraining on ImageNet has limited impacts on Re-ID system due to the large domain gap between ImageNet and person Re-ID data. To seek an alternative to traditional pretraining, here we investigate semantic-based pretraining as another method to utilize additional textual data against ImageNet pretraining. Specifically, we manually construct a diversified FineGPR-C caption dataset for the first time on person Re-ID events. Based on it, a pure semantic-based pretraining approach named VTBR is proposed to adopt dense captions to learn visual representations with fewer images. We train convolutional neural networks from scratch on the captions of FineGPR-C dataset, and then transfer them to downstream Re-ID tasks. Comprehensive experiments conducted on benchmark datasets show that our VTBR can achieve competitive performance compared with ImageNet pretraining - despite using up to 1.4x fewer images, revealing its potential in Re-ID pretraining.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes