Person Re-ID in 2025: Supervised, Self-Supervised, and Language-Aligned. What Works?
It addresses the problem of cross-domain generalization in Person Re-Identification for computer vision researchers, providing an incremental analysis of existing methods.
This paper reviews and compares supervised, self-supervised, and language-aligned training paradigms for Person Re-Identification, finding that supervised models excel in their training domain but fail cross-domain, while language-aligned models show robust cross-domain performance despite not being explicitly trained for it.
Person Re-Identification (ReID) remains a challenging problem in computer vision. This work reviews various training paradigm and evaluates the robustness of state-of-the-art ReID models in cross-domain applications and examines the role of foundation models in improving generalization through richer, more transferable visual representations. We compare three training paradigms, supervised, self-supervised, and language-aligned models. Through the study the aim is to answer the following questions: Can supervised models generalize in cross-domain scenarios? How does foundation models like SigLIP2 perform for the ReID tasks? What are the weaknesses of current supervised and foundational models for ReID? We have conducted the analysis across 11 models and 9 datasets. Our results show a clear split: supervised models dominate their training domain but crumble on cross-domain data. Language-aligned models, however, show surprising robustness cross-domain for ReID tasks, even though they are not explicitly trained to do so. Code and data available at: https://github.com/moiiai-tech/object-reid-benchmark.