CVAug 14, 2019

HorNet: A Hierarchical Offshoot Recurrent Network for Improving Person Re-ID via Image Captioning

arXiv:1908.04915v13.47 citations

Originality Incremental advance

AI Analysis

It addresses appearance variance in person re-identification for surveillance and security applications, offering an incremental improvement by integrating image captioning.

The paper tackles person re-identification by proposing HorNet, a hierarchical offshoot recurrent network that uses image captioning to learn joint visual and language representations, achieving state-of-the-art performance on benchmark datasets like CUHK03, Market-1501, and Duke-MTMC.

Person re-identification (re-ID) aims to recognize a person-of-interest across different cameras with notable appearance variance. Existing research works focused on the capability and robustness of visual representation. In this paper, instead, we propose a novel hierarchical offshoot recurrent network (HorNet) for improving person re-ID via image captioning. Image captions are semantically richer and more consistent than visual attributes, which could significantly alleviate the variance. We use the similarity preserving generative adversarial network (SPGAN) and an image captioner to fulfill domain transfer and language descriptions generation. Then the proposed HorNet can learn the visual and language representation from both the images and captions jointly, and thus enhance the performance of person re-ID. Extensive experiments are conducted on several benchmark datasets with or without image captions, i.e., CUHK03, Market-1501, and Duke-MTMC, demonstrating the superiority of the proposed method. Our method can generate and extract meaningful image captions while achieving state-of-the-art performance.

View on arXiv PDF

Similar