Towards Practical Implementations of Person Re-Identification from Full Video Frames
This work addresses the gap between academic research and real-world deployment in security automation, highlighting an incremental but important shift in evaluation.
The paper tackles the problem that current person re-identification (Re-ID) methods, which use pre-cropped images, are insufficient for practical security applications where inputs are full video frames, and it shows that combining good detection and Re-ID models does not necessarily yield good results.
With the major adoption of automation for cities security, person re-identification (Re-ID) has been extensively studied recently. In this paper, we argue that the current way of studying person re-identification, i.e. by trying to re-identify a person within already detected and pre-cropped images of people, is not sufficient to implement practical security applications, where the inputs to the system are the full frames of the video streams. To support this claim, we introduce the Full Frame Person Re-ID setting (FF-PRID) and define specific metrics to evaluate FF-PRID implementations. To improve robustness, we also formalize the hybrid human-machine collaboration framework, which is inherent to any Re-ID security applications. To demonstrate the importance of considering the FF-PRID setting, we build an experiment showing that combining a good people detection network with a good Re-ID model does not necessarily produce good results for the final application. This underlines a failure of the current formulation in assessing the quality of a Re-ID model and justifies the use of different metrics. We hope that this work will motivate the research community to consider the full problem in order to develop algorithms that are better suited to real-world scenarios.