ReText: Text Boosts Generalization in Image-Based Person Re-identification

Timur Mamedov, Karina Kvanchiani, Anton Konushin, Vadim Konushin

arXiv:2602.05785v12.81 citationsh-index: 16

Originality Incremental advance

AI Analysis

This addresses domain generalization in person re-identification for surveillance and security applications, offering an incremental improvement by combining multimodal learning with mixed data types.

The paper tackles the problem of generalizable image-based person re-identification across unseen domains by proposing ReText, a method that uses textual descriptions to enrich single-camera data and jointly optimizes multiple tasks, achieving strong generalization and outperforming state-of-the-art methods on cross-domain benchmarks.

Generalizable image-based person re-identification (Re-ID) aims to recognize individuals across cameras in unseen domains without retraining. While multiple existing approaches address the domain gap through complex architectures, recent findings indicate that better generalization can be achieved by stylistically diverse single-camera data. Although this data is easy to collect, it lacks complexity due to minimal cross-view variation. We propose ReText, a novel method trained on a mixture of multi-camera Re-ID data and single-camera data, where the latter is complemented by textual descriptions to enrich semantic cues. During training, ReText jointly optimizes three tasks: (1) Re-ID on multi-camera data, (2) image-text matching, and (3) image reconstruction guided by text on single-camera data. Experiments demonstrate that ReText achieves strong generalization and significantly outperforms state-of-the-art methods on cross-domain Re-ID benchmarks. To the best of our knowledge, this is the first work to explore multimodal joint learning on a mixture of multi-camera and single-camera data in image-based person Re-ID.

View on arXiv PDF

Similar