MovieNet-PS: A Large-Scale Person Search Dataset in the Wild
This work addresses the problem of person search in complex, real-world images for computer vision applications, presenting an incremental advancement by integrating context types more effectively.
The paper tackles person search by jointly localizing and identifying individuals in uncropped images, introducing a unified global-local context network (GLCNet) that exploits scene and group context to enhance features, resulting in consistent improvements over state-of-the-art methods on benchmarks like CUHK-SYSU, PRW, and a new MovieNet dataset.
Person search aims to jointly localize and identify a query person from natural, uncropped images, which has been actively studied over the past few years. In this paper, we delve into the rich context information globally and locally surrounding the target person, which we refer to as scene and group context, respectively. Unlike previous works that treat the two types of context individually, we exploit them in a unified global-local context network (GLCNet) with the intuitive aim of feature enhancement. Specifically, re-ID embeddings and context features are simultaneously learned in a multi-stage fashion, ultimately leading to enhanced, discriminative features for person search. We conduct the experiments on two person search benchmarks (i.e., CUHK-SYSU and PRW) as well as extend our approach to a more challenging setting (i.e., character search on MovieNet). Extensive experimental results demonstrate the consistent improvement of the proposed GLCNet over the state-of-the-art methods on all three datasets. Our source codes, pre-trained models, and the new dataset are publicly available at: https://github.com/ZhengPeng7/GLCNet.