CVJul 27, 2021

Semantically Self-Aligned Network for Text-to-Image Part-aware Person Re-identification

Zefeng Ding, Changxing Ding, Zhiyin Shao, Dacheng Tao

arXiv:2107.12666v226.2256 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the challenging problem of searching for person images using textual descriptions, which is incremental as it builds on existing methods with new techniques and a dataset.

The authors tackled the problem of text-to-image person re-identification by proposing a Semantically Self-Aligned Network (SSAN) that extracts part-level features and uses a multi-view non-local network and Compound Ranking loss, resulting in state-of-the-art performance with significant margins.

Text-to-image person re-identification (ReID) aims to search for images containing a person of interest using textual descriptions. However, due to the significant modality gap and the large intra-class variance in textual descriptions, text-to-image ReID remains a challenging problem. Accordingly, in this paper, we propose a Semantically Self-Aligned Network (SSAN) to handle the above problems. First, we propose a novel method that automatically extracts semantically aligned part-level features from the two modalities. Second, we design a multi-view non-local network that captures the relationships between body parts, thereby establishing better correspondences between body parts and noun phrases. Third, we introduce a Compound Ranking (CR) loss that makes use of textual descriptions for other images of the same identity to provide extra supervision, thereby effectively reducing the intra-class variance in textual features. Finally, to expedite future research in text-to-image ReID, we build a new database named ICFG-PEDES. Extensive experiments demonstrate that SSAN outperforms state-of-the-art approaches by significant margins. Both the new ICFG-PEDES database and the SSAN code are available at https://github.com/zifyloo/SSAN.

View on arXiv PDF Code

Similar