Joint Detection and Identification Feature Learning for Person Search
This addresses the gap between cropped image benchmarks and real-world person search for applications like surveillance, though it is incremental by combining existing tasks.
The paper tackles the problem of person search in real-world scenarios where pedestrian bounding boxes are unavailable, proposing a joint deep learning framework that integrates detection and re-identification into a single CNN with an Online Instance Matching loss, achieving faster convergence and outperforming separate approaches.
Existing person re-identification benchmarks and methods mainly focus on matching cropped pedestrian images between queries and candidates. However, it is different from real-world scenarios where the annotations of pedestrian bounding boxes are unavailable and the target person needs to be searched from a gallery of whole scene images. To close the gap, we propose a new deep learning framework for person search. Instead of breaking it down into two separate tasks---pedestrian detection and person re-identification, we jointly handle both aspects in a single convolutional neural network. An Online Instance Matching (OIM) loss function is proposed to train the network effectively, which is scalable to datasets with numerous identities. To validate our approach, we collect and annotate a large-scale benchmark dataset for person search. It contains 18,184 images, 8,432 identities, and 96,143 pedestrian bounding boxes. Experiments show that our framework outperforms other separate approaches, and the proposed OIM loss function converges much faster and better than the conventional Softmax loss.