An Evaluation of Deep CNN Baselines for Scene-Independent Person Re-Identification
This addresses the challenge of simplifying system deployment for person re-identification by reducing reliance on labeled or adapted data from specific scenes, though it is incremental in evaluating baseline architectures.
The paper tackled the problem of scene-independent person re-identification by training deep CNNs on a large composite dataset, showing that this approach can achieve results competitive with unsupervised domain adaptation techniques.
In recent years, a variety of proposed methods based on deep convolutional neural networks (CNNs) have improved the state of the art for large-scale person re-identification (ReID). While a large number of optimizations and network improvements have been proposed, there has been relatively little evaluation of the influence of training data and baseline network architecture. In particular, it is usually assumed either that networks are trained on labeled data from the deployment location (scene-dependent), or else adapted with unlabeled data, both of which complicate system deployment. In this paper, we investigate the feasibility of achieving scene-independent person ReID by forming a large composite dataset for training. We present an in-depth comparison of several CNN baseline architectures for both scene-dependent and scene-independent ReID, across a range of training dataset sizes. We show that scene-independent ReID can produce leading-edge results, competitive with unsupervised domain adaption techniques. Finally, we introduce a new dataset for comparing within-camera and across-camera person ReID.