CVApr 29

AttriBE: Quantifying Attribute Expressivity in Body Embeddings for Recognition and Identification

Basudha Pal, Siyuan Huang, Anirudh Nanduri, Zhaoyang Wang, Rama Chellappa

arXiv:2604.2721864.2

Predicted impact top 51% in CV · last 90 daysOriginality Incremental advance

AI Analysis

For researchers and practitioners in person re-identification, this work provides a method to measure attribute bias in learned embeddings, which is important for fairness and model understanding.

The paper introduces AttriBE, a framework to quantify how strongly attributes like gender, pose, and BMI are encoded in person re-identification embeddings using mutual information. They find that BMI is most expressive in deeper layers of transformer models, and pose becomes more important in cross-spectral settings.

Person re-identification (ReID) systems that match individuals across images or video frames are essential in many real-world applications. However, existing methods are often influenced by attributes such as gender, pose, and body mass index (BMI), which vary in unconstrained settings and raise concerns related to fairness and generalization. To address this, we extend the notion of expressivity, defined as the mutual information between learned features and specific attributes, using a secondary neural network to quantify how strongly attributes are encoded. Applying this framework to three transformer-based ReID models on a large-scale visible-spectrum dataset, we find that BMI consistently shows the highest expressivity in deeper layers. Attributes in the final representation are ranked as BMI > Pitch > Gender > Yaw, and expressivity evolves across layers and training epochs, with pose peaking in intermediate layers and BMI strengthening with depth. We further extend the analysis to cross-spectral person identification across infrared modalities including short-wave, medium-wave, and long-wave infrared. In this setting, pitch becomes comparable to BMI and attribute trends increase monotonically across depth, suggesting increased reliance on structural cues when bridging modality gaps. Overall, the results show that transformer-based ReID embeddings encode a hierarchy of implicit attributes, with morphometric information persistently embedded and pose contributing more strongly under cross-spectral conditions.

View on arXiv PDF

Similar