CVJan 24, 2022

End-to-end Person Search Sequentially Trained on Aggregated Dataset

arXiv:2201.09604v13 citations
AI Analysis

This work addresses person search for video surveillance applications, offering incremental improvements in efficiency and dataset flexibility.

The paper tackles the problem of person search in video surveillance by proposing an end-to-end model that jointly performs detection and feature extraction, achieving state-of-the-art accuracy and faster runtime. It shows that aggregating more pedestrian detection datasets without identity annotations improves re-ID precision and cross-dataset robustness.

In video surveillance applications, person search is a challenging task consisting in detecting people and extracting features from their silhouette for re-identification (re-ID) purpose. We propose a new end-to-end model that jointly computes detection and feature extraction steps through a single deep Convolutional Neural Network architecture. Sharing feature maps between the two tasks for jointly describing people commonalities and specificities allows faster runtime, which is valuable in real-world applications. In addition to reaching state-of-the-art accuracy, this multi-task model can be sequentially trained task-by-task, which results in a broader acceptance of input dataset types. Indeed, we show that aggregating more pedestrian detection datasets without costly identity annotations makes the shared feature maps more generic, and improves re-ID precision. Moreover, these boosted shared feature maps result in re-ID features more robust to a cross-dataset scenario.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes