ShARc: Shape and Appearance Recognition for Person Identification In-the-wild
It addresses the problem of identifying individuals in uncontrolled environments for biometric analysis, with incremental advancements in feature aggregation.
The paper tackles person identification in unconstrained video by proposing ShARc, a multimodal approach that uses 3-D body shape, pose, and appearance, achieving significant improvements over state-of-the-art methods on datasets like CCVID, MEVID, and BRIAR.
Identifying individuals in unconstrained video settings is a valuable yet challenging task in biometric analysis due to variations in appearances, environments, degradations, and occlusions. In this paper, we present ShARc, a multimodal approach for video-based person identification in uncontrolled environments that emphasizes 3-D body shape, pose, and appearance. We introduce two encoders: a Pose and Shape Encoder (PSE) and an Aggregated Appearance Encoder (AAE). PSE encodes the body shape via binarized silhouettes, skeleton motions, and 3-D body shape, while AAE provides two levels of temporal appearance feature aggregation: attention-based feature aggregation and averaging aggregation. For attention-based feature aggregation, we employ spatial and temporal attention to focus on key areas for person distinction. For averaging aggregation, we introduce a novel flattening layer after averaging to extract more distinguishable information and reduce overfitting of attention. We utilize centroid feature averaging for gallery registration. We demonstrate significant improvements over existing state-of-the-art methods on public datasets, including CCVID, MEVID, and BRIAR.