Dissecting Human Body Representations in Deep Networks Trained for Person Identification
This work provides insights into body identification algorithms for computer vision researchers, but it is incremental as it analyzes existing models rather than proposing new ones.
The study analyzed body image embeddings from four body identification networks trained on 1.9 million images across 4,788 identities, finding that these models encode face information, gender, view, and dataset origin, and that identification accuracy can be improved by using PCA to eliminate high-variance dimensions in the embedding space.
Long-term body identification algorithms have emerged recently with the increased availability of high-quality training data. We seek to fill knowledge gaps about these models by analyzing body image embeddings from four body identification networks trained with 1.9 million images across 4,788 identities and 9 databases. By analyzing a diverse range of architectures (ViT, SWIN-ViT, CNN, and linguistically primed CNN), we first show that the face contributes to the accuracy of body identification algorithms and that these algorithms can identify faces to some extent -- with no explicit face training. Second, we show that representations (embeddings) generated by body identification algorithms encode information about gender, as well as image-based information including view (yaw) and even the dataset from which the image originated. Third, we demonstrate that identification accuracy can be improved without additional training by operating directly and selectively on the learned embedding space. Leveraging principal component analysis (PCA), identity comparisons were consistently more accurate in subspaces that eliminated dimensions that explained large amounts of variance. These three findings were surprisingly consistent across architectures and test datasets. This work represents the first analysis of body representations produced by long-term re-identification networks trained on challenging unconstrained datasets.