CVMMMay 8, 2023

IIITD-20K: Dense captioning for Text-Image ReID

arXiv:2305.04497v18 citations
Originality Synthesis-oriented
AI Analysis

This provides a more diverse and richly annotated dataset for researchers in text-to-image ReID, though it is incremental as it builds on existing benchmarks.

The authors tackled the lack of diversity and detailed captions in existing text-to-image ReID datasets by introducing IIITD-20K, a new dataset with 20,000 unique identities and dense captions of at least 26 words per image, which led to substantial performance improvements when using synthetically generated data.

Text-to-Image (T2I) ReID has attracted a lot of attention in the recent past. CUHK-PEDES, RSTPReid and ICFG-PEDES are the three available benchmarks to evaluate T2I ReID methods. RSTPReid and ICFG-PEDES comprise of identities from MSMT17 but due to limited number of unique persons, the diversity is limited. On the other hand, CUHK-PEDES comprises of 13,003 identities but has relatively shorter text description on average. Further, these datasets are captured in a restricted environment with limited number of cameras. In order to further diversify the identities and provide dense captions, we propose a novel dataset called IIITD-20K. IIITD-20K comprises of 20,000 unique identities captured in the wild and provides a rich dataset for text-to-image ReID. With a minimum of 26 words for a description, each image is densely captioned. We further synthetically generate images and fine-grained captions using Stable-diffusion and BLIP models trained on our dataset. We perform elaborate experiments using state-of-art text-to-image ReID models and vision-language pre-trained models and present a comprehensive analysis of the dataset. Our experiments also reveal that synthetically generated data leads to a substantial performance improvement in both same dataset as well as cross dataset settings. Our dataset is available at https://bit.ly/3pkA3Rj.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes