CVApr 7, 2025

SapiensID: Foundation for Human Recognition

Minchul Kim, Dingqiang Ye, Yiyang Su, Feng Liu, Xiaoming Liu

arXiv:2504.04708v118.211 citationsh-index: 17CVPR

Originality Highly original

AI Analysis

This addresses the challenge of robust human recognition in real-world scenarios with varying poses and scales, representing a significant but incremental improvement over existing methods.

The paper tackles the problem of fragmented human recognition systems by introducing SapiensID, a unified model that achieves state-of-the-art results on body ReID benchmarks, outperforming specialized models and remaining competitive with face recognition systems.

Existing human recognition systems often rely on separate, specialized models for face and body analysis, limiting their effectiveness in real-world scenarios where pose, visibility, and context vary widely. This paper introduces SapiensID, a unified model that bridges this gap, achieving robust performance across diverse settings. SapiensID introduces (i) Retina Patch (RP), a dynamic patch generation scheme that adapts to subject scale and ensures consistent tokenization of regions of interest, (ii) a masked recognition model (MRM) that learns from variable token length, and (iii) Semantic Attention Head (SAH), an module that learns pose-invariant representations by pooling features around key body parts. To facilitate training, we introduce WebBody4M, a large-scale dataset capturing diverse poses and scale variations. Extensive experiments demonstrate that SapiensID achieves state-of-the-art results on various body ReID benchmarks, outperforming specialized models in both short-term and long-term scenarios while remaining competitive with dedicated face recognition systems. Furthermore, SapiensID establishes a strong baseline for the newly introduced challenge of Cross Pose-Scale ReID, demonstrating its ability to generalize to complex, real-world conditions.

View on arXiv PDF

Similar